Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libp2p spec #19

Merged
merged 38 commits into from
Oct 29, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
9d2e6e6
add initial DHT spec with some questions
daviddias Jun 26, 2015
1feb2cf
add initial DHT spec with some questions
daviddias Jun 26, 2015
2135736
add ping
daviddias Jun 26, 2015
751c906
provinding segment
daviddias Jun 26, 2015
0e260f7
put question
daviddias Jun 26, 2015
dad0bb7
separate DHT, discovery and routing specs
daviddias Jun 30, 2015
091817b
landing ideas
daviddias Jun 30, 2015
aae56f2
layers
daviddias Jul 6, 2015
2546f85
make swarm part of the network layer and put NAT traversal, connectio…
daviddias Jul 6, 2015
a601ddc
some notes on swarm
daviddias Jul 6, 2015
61efc28
Merge branch 'master' of github.com:ipfs/specs into protocol-spec
daviddias Jul 17, 2015
61bfe39
add chapters numbers to sections on the readme so it is more clear ho…
daviddias Jul 18, 2015
c67ca71
moved the network layer part to its own spec/readme
daviddias Jul 18, 2015
6c508c6
update network spec to latest
daviddias Jul 18, 2015
1c17e08
remove layers.md
daviddias Jul 18, 2015
5a108f6
remove outdated protocol overview
daviddias Jul 18, 2015
b503949
add myself as an author, normalize how authors are written
daviddias Jul 19, 2015
fea3fde
merge wire in, first steps on the libp2p full spec
daviddias Jul 24, 2015
7117a9b
moar stuff
daviddias Jul 24, 2015
52cc3fb
add bits to the discovery section
daviddias Jul 26, 2015
5713fbe
restructure + intro + architecture
daviddias Aug 23, 2015
ec1cea9
try relative path
daviddias Aug 23, 2015
29b83fc
try relative path
daviddias Aug 23, 2015
bf3e6de
fix relative links
daviddias Aug 23, 2015
d7faeac
add implemenations list
daviddias Aug 23, 2015
26cef2d
add more juice on the architecture chapter
daviddias Aug 25, 2015
9bd5e69
add stream muxing and protocol muxing notes
daviddias Sep 13, 2015
4fe489a
add pseudo logo
daviddias Sep 27, 2015
2a51b32
update requirements references and add .6 and .7
daviddias Oct 1, 2015
f94e004
add content to chapter 2
daviddias Oct 1, 2015
33000b3
add Juan's review
daviddias Oct 1, 2015
cfddc58
add onion to the list of addrs
daviddias Oct 1, 2015
6a119c6
add cjdns
daviddias Oct 1, 2015
0377fd7
remove way old /routing spec
daviddias Oct 2, 2015
11bd8ff
update swarm interface
daviddias Oct 2, 2015
8d6939b
readd the secure communications to properties
daviddias Oct 2, 2015
ad849da
break discovery into its own thing
daviddias Oct 2, 2015
39c6b67
add ref to rfc5128
daviddias Oct 11, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 32 additions & 25 deletions protocol/README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,32 @@
# IPFS Protocol Spec (WIP!)
IPFS Protocol Spec
==================

Authors: [@jbenet](http://github.com/jbenet)
> **This spec is an Work In Progress (WIP)**

Authors:

- [Juan Benet](https://github.com/jbenet)
- [David Dias](https://github.com/diasdavid)

Reviewers:

* * *

This [spec](../) document defines the IPFS protocol stack, the subsystems, the
This spec document defines the IPFS protocol stack, the subsystems, the
interfaces, and how it all fits together. It delegates non-interface details
to other specs as much as possible. This is meant as a top-level view of the
protocol and how the system fits together.


Note, this document is not meant to be an introduction of the concepts in IPFS
and is not recommended as a first pass to understanding how IPFS works. For
that, please refer to the [IPFS paper](http://static.benet.ai/t/ipfs.pdf).

## IPFS and the Merkle DAG
# Index

- []()
- []()

## 1. IPFS and the Merkle DAG

At the heart of IPFS is the MerkleDAG, a directed acyclic graph whose links
are hashes. This gives all objects in IPFS useful properties:
Expand All @@ -41,7 +51,7 @@ publish, distribute, serve, and download merkledags. It is the authenticated,
decentralized, permanent web.


## Nodes and Network Model
## 2. Nodes and Network Model

The IPFS network uses PKI based identity. An "ipfs node" is a program that
can find, publish, and replicate merkledag objects. Its identity is defined
Expand All @@ -54,7 +64,7 @@ nodeID := multihash(publicKey)

TODO: constraints on keygen.

### multihash and upgradeable hashing
### 2.1 multihash and upgradeable hashing

All hashes in ipfs are encoded with
[multihash](https://github.com/jbenet/multihash/), a self-describing hash
Expand All @@ -75,7 +85,7 @@ sha3
```


## The Stack
## 3. The Stack

IPFS has a stack of modular protocols. Each layer may have multiple
implementations, all in different modules. This spec will only address the
Expand All @@ -94,7 +104,7 @@ IPFS has five layers:

These are briefly described bottom-up.

### Network -- connecting to peers
### [3.1 Network](network)

The **network** provides point-to-point transports (reliable and unreliable)
between any two IPFS nodes in the network. It handles:
Expand All @@ -105,7 +115,7 @@ between any two IPFS nodes in the network. It handles:

See more in the [network spec](network).

### Routing -- finding peers and data
### [3.2 Routing -- finding peers and data](routing)

The IPFS **Routing** layer serves two important purposes:
- **peer routing** -- to find other nodes
Expand All @@ -122,9 +132,9 @@ of implementations. For example:
to one of a set of supernodes. This is roughly like federated routing.
- **dns:** ipfs routing could even happen over dns.

See more in the routing spec (TODO).
See more in the [routing spec](https://github.com/ipfs/specs/tree/master/protocol/routing).

### Block Exchange -- transfering content-addressed data
### [3.3 Block Exchange -- transfering content-addressed data](exchange)

The IPFS **Block Exchange** takes care of negotiating bulk data transfers.
Once nodes know each other -- and are connected -- the exchange protocols
Expand All @@ -137,7 +147,7 @@ of implementations. For example:
of BitTorrent to work with arbitrary (and not known apriori) DAGs.
- **HTTP:** a simple exchange can be implemented with HTTP clients and servers.

### Merkledag -- making sense of data
### [3.4. Merkledag -- making sense of data](../merkledag)

[As discussed above](#IPFS-and-the-Merkle-DAG), the IPFS **merkledag** is the
datastructure at the heart of IPFS. It is an
Expand Down Expand Up @@ -170,7 +180,7 @@ on top of the merkledag, such as:

See more in the merkledag spec (TODO).

### Merkledag Paths
### [3.4.1 Merkledag Paths](../merkledag)

The merkledag is enough to resolve paths:

Expand All @@ -186,7 +196,7 @@ See more in the path resolution spec (TODO).

![](../media/ipfs-resolve/ipfs-resolve.gif)

### Naming -- PKI namespace and mutable pointers
### [3.5 Naming -- PKI namespace and mutable pointers]()

IPFS is mostly concerned with content-addressed data, which by nature
is immutable: changing an object would change its hash -- and thus its
Expand All @@ -209,7 +219,7 @@ See more in the namin spec (TODO).



## Applications and Datastructures -- on top of IPFS
## [4. Applications and Datastructures -- on top of IPFS]()

The stack described so far is enough to represent arbitrary datastructures
and replicate them accross the internet. It is also enough to build and
Expand All @@ -222,7 +232,7 @@ them to the rest of the world using any of the tools that understand IPFS.
See more in the datastructures and applications specs (TODO).


### unixfs -- representing traditional files
### [4.1 unixfs -- representing traditional files]()

The unix filesystem abstractions -- files and directories -- are the main way
people conceive of files in the internet. In IPFS, `unixfs` is a datastructure
Expand All @@ -235,7 +245,7 @@ to carry over information like:
See more in the unixfs spec (TODO).


## Lifetime of fetching an object.
## [5 Lifetime of fetching an object.]()

Suppose we ask an IPFS node to retrieve

Expand All @@ -253,11 +263,7 @@ Then, the IPFS node resolves the components.
The first component in an `/ipfs/...` path is always a multihash.
The rest are names of links, to be resolved into multihashes.





## IPFS User Interfaces
## [6 IPFS User Interfaces]()

IPFS is not just a protocol. It is also a toolset. IPFS implementations
include various tools for working with the merkledag, how to publish
Expand All @@ -271,9 +277,10 @@ design and implementation. Examples:
- The IPFS libs - implementations in various languages
- The IPFS gateways - nodes in the internet that serve HTTP over IPFS

## ~~WIP~~
* * *

### WIP Stack Dump:

WIP Stack Dump:
- How the layers fit together
- How they call on each other
- Mention all the ports
Expand Down
30 changes: 30 additions & 0 deletions protocol/network/1-introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
1 Introduction
==============

With the developement of building IPFS, the InterPlanetary FileSystem[?], we came to learn about the several challenges imposed by having to run a distributed file system on top of heterogeneous devices, with diferent network setups and capabilities. During this process, we had to revisit the whole network stack and elaborate solutions to overcome the obstacles imposed by design decisions of the several layers and protocols, without breaking compatibility or recreating technologies.

In order to build this library, we focused on tackling problems independently, creating less complex solutions with powerful abstractions, that when composed, can offer an environment for a Peer-to-Peer application to work sucessfuly.

## 1.1 Motivation

`libp2p` is the result of the collective experience while building a distributed system, that puts the responsability on the developers on how they want their app to interop with others in the network, favoring configuration and extensibility instead of assumptions about the network setup.

In essence, a peer using libp2p should be able to communicate with another peer using different transports, including connection relay, and talk over different protocols, negotiated on demand.

## 1.2 Goals

Our goals for libp2p specification and its implementations are:

- Enable the use of various:
- transports: TCP, UDP, SCTP, UDT, uTP, QUIC, SSH, etc.
- authenticated transports: TLS, DTLS, CurveCP, SSH
- Efficient use of sockets (connection reuse)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also:

  • efficient use of underlying transport (e.g. native stream muxing, native auth, etc)

- Enable communications between peers to be multiplex over one socket (avoiding handshake overhead)
- Enable multiprotocols and respective versions to be used between peers, using a negotiation process.
- Be backwards compatible
- Work in current systems
- Use the current network technologies to its best capability
- Have NAT Traversal
- Enable connections to be relayed
- Enable encrypted channels
- Efficient use of underlying transport (e.g. native stream muxing, native auth, etc)
78 changes: 78 additions & 0 deletions protocol/network/2-state-of-the-art.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
2 An analysis the State of the Art in Network Stacks
====================================================

This section presents to the reader an analysis of the available protocols and architectures for a Network Stack. The goal is to provide the foundations from which to infer the conclusions and understand what are libp2p requirements and its designed architecture.

## 2.1 The client-server model

The client-server model indicates that both parties that ends of the channel have different roles, that support different services and/or have different capabilities, or in another words, speak different protocols.

Building client-server applications has been natural tendency for a number of reasons:

- The bandwidth inside a DataCenter is considerably high compared to the one available for clients to connect between each other
- DataCenter resources are considerably cheaper, due to efficient usage and bulk stocking
- Enables easier methods for the developer and system admin to have a fine grained control over the application
- Reduces the number of heteregeneus systems to be handled (although it is still considerably high)
- Systems like NAT make it really hard for client machines to find and talk with each other, forcing a developer to perform very clever hacks to traverse these obstacles.
- Protocols started to be designed with the assumption that a developer will create a client-server application from the start.

We even learned how to hide all of the complexity of a distributed system behind gateways on the Internet, using protocols that were designed to perform a point-to-point operation, such as HTTP, making it opaque for the application to see and understand how the cascade of service calls made for each request.

`libp2p` offers a move towards dialer-listener interactions, from the client-server listener, where it is not implicit which of the entities, dialer or listener, has which capabilities or is enabled to perform which actions. Setting up a connection between two applications today is a multilayered problem to solve, and these connections should not have a purpose bias, instead support to several other protocols to work on top of the established connection. In a client-server model, a server sending data without a prior request from the client is known as a push model, which typically adds more complexity, in a dialer-listener model, both entities can perform requests independently.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 very nicely said

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :):)


## 2.2 Categorizing the network stack protocols by solutions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a note describing why this is here:

Before diving into the libp2p protocols, it is important to understand the large diversity of protocols already in wide use and deployment that help maintain today's simple abstractions. For example, when one thinks about an HTTP connection, one might naively just think HTTP/TCP/IP as the main protocols involved, but in reality many more participate, all depending on the usage, the networks involved, and so on. Protocols like DNS, DHCP, ARP, OSPF, Ethernet, 802.11 (WiFI), ... and many others get involved. Looking inside ISPs' own networks would reveal dozens more.

Additionally, it's worth noting that the traditional 7-layer OSI model characterization does not fit libp2p. Instead, we categorize protocols based on their role, the problem they solve. The upper layers of the OSI model are geared towards point-to-point links between applications, whereas the libp2p protocols speak more towards various sizes of networks, with various properties, under various different security models. Different libp2p protocols can have the same role (in the OSI model, this would be "address the same layer"), meaning that multiple protocols can run simultaneously, all addressing one role (instead of one-protocol-per-layer in traditional OSI stacking) For example, bootstrap lists, mDNS, DHT Discovery, and PEX are all forms of the role "Peer Discovery"; they can coexist and even synergize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the msg above o/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: Same here. I've added, this comment didn't get 'outdated' because it was a couple of lines below


Before diving into the libp2p protocols, it is important to understand the large diversity of protocols already in wide use and deployment that help maintain today's simple abstractions. For example, when one thinks about an HTTP connection, one might naively just think HTTP/TCP/IP as the main protocols involved, but in reality many more participate, all depending on the usage, the networks involved, and so on. Protocols like DNS, DHCP, ARP, OSPF, Ethernet, 802.11 (WiFI), ... and many others get involved. Looking inside ISPs' own networks would reveal dozens more.

Additionally, it's worth noting that the traditional 7-layer OSI model characterization does not fit libp2p. Instead, we categorize protocols based on their role, the problem they solve. The upper layers of the OSI model are geared towards point-to-point links between applications, whereas the libp2p protocols speak more towards various sizes of networks, with various properties, under various different security models. Different libp2p protocols can have the same role (in the OSI model, this would be "address the same layer"), meaning that multiple protocols can run simultaneously, all addressing one role (instead of one-protocol-per-layer in traditional OSI stacking) For example, bootstrap lists, mDNS, DHT Discovery, and PEX are all forms of the role "Peer Discovery"; they can coexist and even synergize.

### 2.2.1 Establishing the physical Link

- ethernet
- wifi
- bluetooth
- usb

### 2.2.2 Addressing a machine or process

- IPv4
- IPv6
- Hidden Addressing, like SDP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Oblivious routing systems, like Tor and I2P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I've added, this comment didn't get 'outdated' because it was a couple of lines below


### 2.2.3 Discovering other peers or services

- ARP
- DHCP
- DNS
- Onion

### 2.2.4 Routing messages through the Network

- RIP(1, 2)
- OSP
- PPP
- Tor
- I2P
- cjdns

### 2.2.5 Transport

- TCP
- UDP
- UDT
- QUIC
- WebRTC DataChannel

### 2.2.6 Agreed semantics for applications to talk to each other

- RMI
- Remoting
- RPC
- HTTP


## 2.3 Current Shortcommings

Although we currently have a panoply of protocols available for our services the communicate, the abundance and the variety of solutions is also its shortfall. It is currently dificult for an application to be able to support and be available through several transports (for e.g. the lack of TCP/UDP stack in browser applications).

There is also no 'presence linking', meaning that isn't a notion for a peer to announce itself in several transports, so that other peer can guarantee that it is always the same peer.
104 changes: 104 additions & 0 deletions protocol/network/3-requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
3 Requirements and considerations
=================================

## 3.1 NAT traversal

Network Address Translation is ubiquitous in the internet. Not only are most consumer devices behind many layers of NATs, but most datacenter nodes are often behind NAT for security or virtualization reasons. As we move into containerized deployments, this is getting worse. IPFS implementations SHOULD provide a way to traverse NATs, otherwise it is likely that operation will be affected. Even nodes meant to run with real IP addresses must implement NAT traversal techniques, as they may need to establish connections to peers behind NAT.

libp2p accomplishes full NAT traversal using an ICE-like protocol. It is not exactly ICE, as ipfs networks provide the possibility of relaying communications over the IPFS protocol itself, for coordinating hole-punching or even relaying communication.

It is recommended that implementations use one of the many NAT traversal libraries available, such as `libnice`, `libwebrtc`, or `natty`. However, NAT traversal must be interoperable.

## 3.2 Relay

Unfortunately, due to symmetric NATs, container and VM NATs, and other impossible-to-bypass NATs, libp2p MUST fallback to relaying communication to establish a full connectivity graph. To be complete, implementations MUST support relay, though it SHOULD be optional and able to be turned off by end users.

## 3.3 Encryption

Communications on libp2p may be:

- **encrypted**
- **signed** (not encrypted)
- **clear** (not encrypted, not signed)

We take both security and performance seriously. We recognize that encryption is not viable for some in-datacenter high performance use cases.

We recommend that:
- implementations encrypt all communications by default
- implementations are audited
- unless absolutely necessary, users normally operate with encrypted communications only.

libp2p uses cyphersuites like TLS.

**NOTE:** we do not use lib2p directly, because we do not want the CA system baggage. Most libp2p implementations are very big. Since the lib2p model begins with keys, libp2p only needs to apply ciphers. This is a minimal portion of the whole TLS standard.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't hold true today and must be updated. Discussion - #29


## 3.4 Transport Agnostic

libp2p is transport agnostic, so it can run over any transport protocol. It does not even depend on IP; it may run on top of NDN, XIA, and other new internet architectures.

In order to reason about possible transports, libp2p uses [multiaddr](https://github.com/jbenet/multiaddr), a self-describing addressing format. This makes it possible for libp2p to treat addresses opaquely everywhere in the system, and have support various transport protocols in the network layer. The actual format of addresses in libp2p is `ipfs-addr`, a multiaddr that ends with an ipfs nodeid. For example, these are all valid `ipfs-addrs`:

```
# ipfs over tcp over ipv6 (typical tcp)
/ip6/fe80::8823:6dff:fee7:f172/tcp/4001/ipfs/QmYJyUMAcXEw1b5bFfbBbzYu5wyyjLMRHXGUkCXpag74Fu

# ipfs over utp over udp over ipv4 (udp-shimmed transport)
/ip4/162.246.145.218/udp/4001/utp/ipfs/QmYJyUMAcXEw1b5bFfbBbzYu5wyyjLMRHXGUkCXpag74Fu

# ipfs over ipv6 (unreliable)
/ip6/fe80::8823:6dff:fee7:f172/ipfs/QmYJyUMAcXEw1b5bFfbBbzYu5wyyjLMRHXGUkCXpag74Fu

# ipfs over tcp over ip4 over tcp over ip4 (proxy)
/ip4/162.246.145.218/tcp/7650/ip4/192.168.0.1/tcp/4001/ipfs/QmYJyUMAcXEw1b5bFfbBbzYu5wyyjLMRHXGUkCXpag74Fu

# ipfs over ethernet (no ip)
/ether/ac:fd:ec:0b:7c:fe/ipfs/QmYJyUMAcXEw1b5bFfbBbzYu5wyyjLMRHXGUkCXpag74Fu
```

**Note:** at this time, no unreliable implementations exist. The protocol's interface for defining and using unreliable transport has not been defined.

**TODO:** define how unreliable transport would work. base it on webrtc.

## 3.5 Multi-Multiplexing

The libp2p Protocol is a collection of multiple protocols. In order to conserve resources, and to make connectivity easier, libp2p can perform all its operations through a single port, such as TCP or UDP port, depending on the transports used. libp2p can multiplex its many protocols through point-to-point connections. This multiplexing is for both reliable streams and unreliable datagrams.

libp2p is pragmatic. It seeks to be usable in as many settings as possible, to be modular and flexible to fit various use cases, and to force as few choices as possible. Thus the libp2p network layer provides what we're loosely referring to as "multi-multiplexing":

- can multiplex multiple listen network interfaces
- can multiplex multiple transport protocols
- can multiplex multiple connections per peer
- can multiplex multiple client protocols
- can multiples multiple streams per protocol, per connection (SPDY, HTTP2, QUIC, SSH)
- has flow control (backpressure, fairness)
- encrypts each connection with a different ephemeral key

To give an example, imagine a single IPFS node that:

- listens on a particular TCP/IP address
- listens on a different TCP/IP address
- listens on a SCTP/UDP/IP address
- listens on a UDT/UDP/IP address
- has multiple connections to another node X
- has multiple connections to another node Y
- has multiple streams open per connection
- multiplexes streams over http2 to node X
- multiplexes streams over ssh to node Y
- one protocol mounted on top of libp2p uses one stream per peer
- one protocol mounted on top of libp2p uses multiple streams per peer

Not providing this level of flexbility makes it impossible to use libp2p in various platforms, use cases, or network setups. It is not important that all implementations support all choices; what is critical is that the spec is flexible enough to allow implementations to use precisely what they need. This ensures that complex user or application constraints do not rule out libp2p as an option.

## 3.6 Enable several network topologies

Differents systems have different requirements and with that comes different topologies. In the P2P literature we can find these topologies being enumerated as: Unstructured, Structured, Hybrid and Centralised.

Centralised topologies are the most common to find in Web Applications infrastructures, it requires for a given service or services to be present at all times in a known static location, so that other services can access them. Unstructured networks represent a type of P2P networks where the network topology is completely random, or at least non deterministic, while structured networks have a implicit way of organizing themselves, hybrid networks are a mix of the last two.

With this in consideration, libp2p must be ready to perform different routing mechanisms and peer discovery, in order to build the routing tables that will enable services to propagate messages or to find each other.

## 3.7 Resource Discovery

libp2p also solves the problem with discoverability of resources inside of a network through Records, a record is a unit of data that can be digitally signed, timestamp and/or used with other methods to give it a ephemeral validity. These Records hold pieces of information, such as location of availability of resources present in the network, these resources can be data, storage, CPU cycles and other types of services.

libp2p must not put a constraint on the location of resources, instead offer ways to find them easily in the network or use a sidechannel.
Loading