Skip to content
This repository has been archived by the owner on Jul 23, 2022. It is now read-only.

Integrating IPFS (Haskell Implementation) #1

Closed
CMCDragonkai opened this issue Jan 22, 2017 · 32 comments
Closed

Integrating IPFS (Haskell Implementation) #1

CMCDragonkai opened this issue Jan 22, 2017 · 32 comments

Comments

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Jan 22, 2017

Also see this: libp2p/go-libp2p#175

Current way to actually make use of go-libp2p:

go get -u github.com/whyrusleeping/gx
go get -u github.com/whyrusleeping/gx-go

# these commands are weird because of a quirk in go-libp2p
go get -d github.com/libp2p/go-libp2p && cd $GOPATH/src/github.com/libp2p/go-libp2p && gx-go rewrite

git clone https://github.com/libp2p/go-libp2p.git
cd go-libp2p
make
make deps
go build ./examples/echo

There's no way to fetch specific versions or tags using the official go-get. So sometimes some head binaries will not be compatible with older go versions.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Feb 1, 2017

There are many NAT implementations, the behaviour of NAT is not standardised. Here's a list: https://en.wikipedia.org/wiki/Network_address_translation#Methods_of_translation (notice how the mapping of client IP and port to NAT IP and port is different and how the NAT decides whether to accept packets from external servers)

Each implementation of NAT means a different kind of NAT traversal technique will need to be used. And in some cases, NAT hole-punching will be impossible.

The solution that always works is via a relay server. This of course is just client-server communication, but it also means we don't have true P2P or distributed system, but a system with super nodes.

We have discovered 2 kinds of centralisation in libp2p, the first is for peer-discovery and that's done through a bootstrap list (alternatives include DHT and Peer Discovery), you always need some knowledge ahead of time before you can join a network. (or you bruteforce by massscanning the internet). The other centralisation is the usage of relay servers for NAT traversal.

IPFS has started by just implementing NAT traversal via relays, and decided to optimise for direct connections later. Here are some resources:

Even with NAT traversal implemented, there is a problem with multi-homing or when you change networks. ipfs/kubo#2413

When a client initiates a connection, it is usually initiated with an ephemeral source port. When this client ip and source port is mapped to the NAT, the NAT can suffer from port exhaustion, limiting each external IP to 65535 concurrent connections. However NATs can have multiple external IPs. This is usually implemented via dynamic NAT (where the NAT can assign different internal clients to different external IPs). Also see this blog post by nginx https://www.nginx.com/blog/overcoming-ephemeral-port-exhaustion-nginx-plus/ where ephemeral port exhaustion can occur in load balancers and reverse proxies (because they act as clients as well when proxying the connection out).

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Feb 1, 2017

Using godepgraph to understands the dependency structure of IPFS:

go-libp2p-host:
go-libp2p-host

go-libp2p-swarm:
go-libp2p-swarm

go-libp2p-net:
go-libp2p-net

The host object appears to wrap the swarm object. And host here just represents a single node in a libp2p network. While swarm is the underlying implementation of the p2p network, and is what brings in all the dependencies. The net is just a subcomponent of swarm.

@CMCDragonkai CMCDragonkai changed the title Current Progress Integrating IPFS (Haskell Implementation) Feb 1, 2017
@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Feb 1, 2017

IPFS tries to abstract over multiple transports. A common feature of these transports is the ability to multiplex streams of data.

The interface that all these different transports need to implement is: https://github.com/jbenet/go-stream-muxer

The go-ipfs implements a variety of transports, however the node-ipfs only implements the spdystream: https://github.com/libp2p/js-libp2p-spdy

This allows the go implementation and the js implementation to communicate. New implementations (in different languages) must also implement one of the transport layers, however I'm guessing that the haskell implementation would have to implement to spdystream in order to be able to talk to both the go implementation and the js implementation. Implementing any other one, would mean that the haskell implementation cannot talk to js implementation atm, and the p2p network would not be complete.

Once you have multiple protocols supported, you still need some negotiation between nodes to choose which transport protocol to use. This is done via https://github.com/multiformats/multistream-select which is implemented in go as https://github.com/multiformats/go-multistream

Note that while certain language implementations have "interfaces" and "implementations", there is often a spec that acts as an "interface" across language implementations. (Don't get confused by their language colours there, some of these are markdown files with some test scripts). An example of this is:

In the end, it appears that spdy is winning. But we need to know what version of spdy we are going to try to implement.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Feb 12, 2017

Reading through libp2p specs completed.

Next steps:

@ghost
Copy link

ghost commented Mar 26, 2017

@CMCDragonkai
Copy link
Member Author

Problem with port 0: multiformats/go-multiaddr#51

@ghost
Copy link

ghost commented May 16, 2017

Need to look into using a phantom type to mark the inputstream of a multistream as either a validated connection (a successful negotiation has occurred) or an unvalidated connection (the multistream has not negotiated an underlying protocol for the connection for which raw data is communicated)

After negotiating some underlying protocol connection with multistream muxer, can you then renegotiate another protocol connection with that multistream muxer using some other protocol? Need to raise this as an issue somewhere.

Also need to see about binary, cereal libraries for extending the possible encodings used to encode ipfs addresses. And whether efficient text encoding is necessary somewhere in the usual pipeline.

@ghost
Copy link

ghost commented May 16, 2017

@ghost
Copy link

ghost commented May 16, 2017

@ghost
Copy link

ghost commented May 19, 2017

@CMCDragonkai
Copy link
Member Author

Implemented encode and decode on haskell-multiaddress, going to add some extra utility functions, and then prepare things for haskage

@whyrusleeping
Copy link

Hey @CMCDragonkai, This is an awesome log! Very happy to have stumbled across it :)

So many things here, one that stuck out is on the stream multiplexer: We're mainly achieving go and js interop via the 'multiplex' muxer, the spdy implementation we have in go (from docker) is incomplete (it lacks a few features). The go multiplex code is here: https://github.com/whyrusleeping/go-multiplex
The multiplex protocol itself can be slow due to head of line blocking, and we're wanting to use something else, but it works. It also has the advantage of being very simple, and thus easy for people to implement in new languages.

Please let me know if you have any questions, this is all super cool!

@ghost
Copy link

ghost commented May 20, 2017

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented May 21, 2017

@whyrusleeping Welcome for stumbling on this. @plintx is working on the muxer and multistream ports. I'm cleaning up multiaddr impl. I'm sure if we have further questions, we'll reach out.

@CMCDragonkai
Copy link
Member Author

Hey @whyrusleeping the haskell port of multiaddr is now feature complete in relation to the go implementation. It's also using multiformat's existing haskell-multihash. It is also uploaded to haskage as hs-multiaddr as multiaddr is already taken. Can you add it to the list of implementations of Multiaddr in the Multiaddr repo. Here is the repo: https://github.com/MatrixAI/haskell-multiaddr

Note that ported all relevant tests from the go impl.

@CMCDragonkai
Copy link
Member Author

Next steps:

  1. peer
  2. peerstore

@Kubuxu
Copy link

Kubuxu commented Jul 5, 2017

Also as a note, we couldn't find good implementation of spdy in Golang, that is why we implemented multiplex in Golang and fixed some issues in it's JS implemenation:

Original: https://github.com/maxogden/multiplex
Go: https://github.com/whyrusleeping/go-multiplex
JS fixes: https://github.com/diasdavid/multiplex

There is also work going on implementing QUIC support in go-ipfs.

@CMCDragonkai
Copy link
Member Author

@ghost
Copy link

ghost commented Sep 23, 2017

@CMCDragonkai
Copy link
Member Author

Next step: fundamental research on multiplexing to design the correct abstraction for representing recursive layers of multiplexing in order to to provide an elegant self-contained consistent multiplexing component of haskell libp2p.

Also combine libp2p network, swarm and peer stream swarm together into a single "swarm"/"cluster"/"network" component.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Sep 25, 2017

All multiplexing can be reduced to:

Muxer [Input] Output
Demuxer Input [Output]

Try using the io streams library for Input and Output. The end result is building up the necessary interfaces for libp2p streams, connections and transports using just underlying haskell io-stream interface. All of the dial functionality should be built on top of existing stream creation. https://github.com/snapframework/io-streams

Muxers and demuxers are just user-defined combinators.

Input and Output types should be able to be created independently and composed together, and executed only in the IO monad context.

While Input and Output are abstract data types, they should be able to created from lower level constructs like sockets and file descriptors. Just like it is possible to wrap simple generators as conduit coroutines. Consider the network package for this lower level functionality: https://hackage.haskell.org/package/network

This means Input and Output may be just wrappers (possibly newtypes or type classes) of IO streams.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Sep 26, 2017

For certain opeations such as dealing with multiaddresses, we cannot just convert a network socket directly to an iostream because we lose information. Instead we must retain the socket itself. This could be done with a record that keeps the underlying socket and also the stream on the socket.

For the purposes of the multiplexer it doesn't care about all the properties of the input and output. It will simply ask that its inputs and outputs meets a simple interface that allows it to perform its multiplexing logic.

The "Conn" type in go libp2p encompasses multiple layers, almost like its own OSI layer stack (but separated by features rather than protocols http://davidad.github.io/blog/2014/04/24/an-osi-laye), where the base implementation is just go's stdlib net functionality. Layers include things like multiaddress enhancement, security involving private and public keys and finally multiplexing. In our case, we can also create a "Conn" type that is built on top of Haskell's Networking library, but instead of spreading it out over multiple repositories, the entire thing can be in one repository, and itself can implement the interface that the multiplexer requires, which may allow connections to be multiplexed over connections.

As the multiplexer implementation can be injected into a connection, this allows any kind of multiplexing to occur. It should also be possible to make multiplexers compositional, thus being able to compose multiplexers with multiplexers (maybe in a Operad manner).

Since we're going to rely on the Haskell networking library, which in turn relies on the OS kernel TCP/IP stack, this basically means we don't really have full control over the entire network. However it should be possible to instead implement on top of a user-space networking library like Hans. But this is left for the future. (To do this you'd need typeclasses for every integration to a lower level implementation of networking.)

@ghost
Copy link

ghost commented Nov 1, 2017

What a spdy connection in libp2p looks like based on js-libp2p packet inspection (sorry for the bad handwriting, should create a proper diagram later):
spdy_libp2p

@ghost
Copy link

ghost commented Nov 4, 2017

Also here's the packet capture from wireshark for the initial negotiation between two nodes using the js-libp2p implementation (from the js-libp2p echo example, with some of the options for secio disabled and using the spdy muxer instead of libp2p-multiplex):
libp2p_node_negotiation.pcapng.gz

@whyrusleeping
Copy link

@plintx Hey, There shouldnt be any need to do a packet capture for this. Is there a certain specification youre looking for?

cc @diasdavid

@ghost
Copy link

ghost commented Nov 5, 2017

@whyrusleeping
No not really, already aware of the IETF spec for spdy and http2, and the libp2p multistream-select spec, just was unsure about some details that were easier to confirm with a packet capture.

@daviddias
Copy link

@plintx Happy to go through more details :) The best way to implement libp2p today is by doing module by module. In https://github.com/libp2p/js-libp2p/tree/master/examples you have multiple examples that show you from attaching a transport, to attaching stream multiplexing, protocol multiplexing, crypto channel and discovery. You can use these examples to enable only a subset of the features so that you can implement your own version incrementally.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Nov 5, 2017

The packet capture was pretty useful in understanding the concurrent behaviour and the encoding that happens at line level. It essentially visualised the spec. Also it showed different things happening between js libp2p and go libp2p, which makes sense since they are using different protocols.

@ghost
Copy link

ghost commented Nov 5, 2017

@diasdavid Yes, thanks for providing those examples, very useful, and so easy to switch between muxers and transports.

Actually, I was wondering about server push: is this feature used in libp2p implementations atm? It seems like there could be some interesting use cases for it.

@CMCDragonkai
Copy link
Member Author

CMCDragonkai commented Sep 6, 2021

No longer using libp2p we've ended up building our own P2P protocols with wireguard, UTP, grpc and custom NAT traversal.

We are however using CID spec in a number of projects now. Still not sure if CIDs should be used for identifying nodes themselves in a P2P network.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

4 participants