-
Notifications
You must be signed in to change notification settings - Fork 0
Integrating IPFS (Haskell Implementation) #1
Comments
Also see this: libp2p/go-libp2p#175 Current way to actually make use of go-libp2p: go get -u github.com/whyrusleeping/gx
go get -u github.com/whyrusleeping/gx-go
# these commands are weird because of a quirk in go-libp2p
go get -d github.com/libp2p/go-libp2p && cd $GOPATH/src/github.com/libp2p/go-libp2p && gx-go rewrite
git clone https://github.com/libp2p/go-libp2p.git
cd go-libp2p
make
make deps
go build ./examples/echo There's no way to fetch specific versions or tags using the official |
There are many NAT implementations, the behaviour of NAT is not standardised. Here's a list: https://en.wikipedia.org/wiki/Network_address_translation#Methods_of_translation (notice how the mapping of client IP and port to NAT IP and port is different and how the NAT decides whether to accept packets from external servers) Each implementation of NAT means a different kind of NAT traversal technique will need to be used. And in some cases, NAT hole-punching will be impossible. The solution that always works is via a relay server. This of course is just client-server communication, but it also means we don't have true P2P or distributed system, but a system with super nodes. We have discovered 2 kinds of centralisation in libp2p, the first is for peer-discovery and that's done through a bootstrap list (alternatives include DHT and Peer Discovery), you always need some knowledge ahead of time before you can join a network. (or you bruteforce by massscanning the internet). The other centralisation is the usage of relay servers for NAT traversal. IPFS has started by just implementing NAT traversal via relays, and decided to optimise for direct connections later. Here are some resources:
Even with NAT traversal implemented, there is a problem with multi-homing or when you change networks. ipfs/kubo#2413 When a client initiates a connection, it is usually initiated with an ephemeral source port. When this client ip and source port is mapped to the NAT, the NAT can suffer from port exhaustion, limiting each external IP to 65535 concurrent connections. However NATs can have multiple external IPs. This is usually implemented via dynamic NAT (where the NAT can assign different internal clients to different external IPs). Also see this blog post by nginx https://www.nginx.com/blog/overcoming-ephemeral-port-exhaustion-nginx-plus/ where ephemeral port exhaustion can occur in load balancers and reverse proxies (because they act as clients as well when proxying the connection out). |
Using godepgraph to understands the dependency structure of IPFS: The host object appears to wrap the swarm object. And host here just represents a single node in a libp2p network. While swarm is the underlying implementation of the p2p network, and is what brings in all the dependencies. The net is just a subcomponent of swarm. |
IPFS tries to abstract over multiple transports. A common feature of these transports is the ability to multiplex streams of data. The interface that all these different transports need to implement is: https://github.com/jbenet/go-stream-muxer The go-ipfs implements a variety of transports, however the node-ipfs only implements the spdystream: https://github.com/libp2p/js-libp2p-spdy This allows the go implementation and the js implementation to communicate. New implementations (in different languages) must also implement one of the transport layers, however I'm guessing that the haskell implementation would have to implement to spdystream in order to be able to talk to both the go implementation and the js implementation. Implementing any other one, would mean that the haskell implementation cannot talk to js implementation atm, and the p2p network would not be complete. Once you have multiple protocols supported, you still need some negotiation between nodes to choose which transport protocol to use. This is done via https://github.com/multiformats/multistream-select which is implemented in go as https://github.com/multiformats/go-multistream Note that while certain language implementations have "interfaces" and "implementations", there is often a spec that acts as an "interface" across language implementations. (Don't get confused by their language colours there, some of these are markdown files with some test scripts). An example of this is:
In the end, it appears that spdy is winning. But we need to know what version of spdy we are going to try to implement. |
List of currently defined interfaces (specifications for implementation) in the ipfs/libp2p ecosystem: libp2p: ipfs: |
Reading through libp2p specs completed. Next steps:
|
Problem with port 0: multiformats/go-multiaddr#51 |
Need to look into using a phantom type to mark the inputstream of a multistream as either a validated connection (a successful negotiation has occurred) or an unvalidated connection (the multistream has not negotiated an underlying protocol for the connection for which raw data is communicated) After negotiating some underlying protocol connection with multistream muxer, can you then renegotiate another protocol connection with that multistream muxer using some other protocol? Need to raise this as an issue somewhere. Also need to see about binary, cereal libraries for extending the possible encodings used to encode ipfs addresses. And whether efficient text encoding is necessary somewhere in the usual pipeline. |
Full multicodec table https://github.com/multiformats/multicodec/blob/master/table.csv |
Implemented encode and decode on haskell-multiaddress, going to add some extra utility functions, and then prepare things for haskage |
Hey @CMCDragonkai, This is an awesome log! Very happy to have stumbled across it :) So many things here, one that stuck out is on the stream multiplexer: We're mainly achieving go and js interop via the 'multiplex' muxer, the spdy implementation we have in go (from docker) is incomplete (it lacks a few features). The go multiplex code is here: https://github.com/whyrusleeping/go-multiplex Please let me know if you have any questions, this is all super cool! |
@whyrusleeping Welcome for stumbling on this. @plintx is working on the muxer and multistream ports. I'm cleaning up multiaddr impl. I'm sure if we have further questions, we'll reach out. |
Hey @whyrusleeping the haskell port of multiaddr is now feature complete in relation to the go implementation. It's also using multiformat's existing haskell-multihash. It is also uploaded to haskage as hs-multiaddr as multiaddr is already taken. Can you add it to the list of implementations of Multiaddr in the Multiaddr repo. Here is the repo: https://github.com/MatrixAI/haskell-multiaddr Note that ported all relevant tests from the go impl. |
Next steps:
|
Also as a note, we couldn't find good implementation of spdy in Golang, that is why we implemented Original: https://github.com/maxogden/multiplex There is also work going on implementing QUIC support in go-ipfs. |
Next step: fundamental research on multiplexing to design the correct abstraction for representing recursive layers of multiplexing in order to to provide an elegant self-contained consistent multiplexing component of haskell libp2p. Also combine libp2p network, swarm and peer stream swarm together into a single "swarm"/"cluster"/"network" component. |
All multiplexing can be reduced to:
Try using the io streams library for Muxers and demuxers are just user-defined combinators.
While This means |
For certain opeations such as dealing with multiaddresses, we cannot just convert a network socket directly to an iostream because we lose information. Instead we must retain the socket itself. This could be done with a record that keeps the underlying socket and also the stream on the socket. For the purposes of the multiplexer it doesn't care about all the properties of the input and output. It will simply ask that its inputs and outputs meets a simple interface that allows it to perform its multiplexing logic. The "Conn" type in go libp2p encompasses multiple layers, almost like its own OSI layer stack (but separated by features rather than protocols http://davidad.github.io/blog/2014/04/24/an-osi-laye), where the base implementation is just go's stdlib net functionality. Layers include things like multiaddress enhancement, security involving private and public keys and finally multiplexing. In our case, we can also create a "Conn" type that is built on top of Haskell's Networking library, but instead of spreading it out over multiple repositories, the entire thing can be in one repository, and itself can implement the interface that the multiplexer requires, which may allow connections to be multiplexed over connections. As the multiplexer implementation can be injected into a connection, this allows any kind of multiplexing to occur. It should also be possible to make multiplexers compositional, thus being able to compose multiplexers with multiplexers (maybe in a Operad manner). Since we're going to rely on the Haskell networking library, which in turn relies on the OS kernel TCP/IP stack, this basically means we don't really have full control over the entire network. However it should be possible to instead implement on top of a user-space networking library like Hans. But this is left for the future. (To do this you'd need typeclasses for every integration to a lower level implementation of networking.) |
Also here's the packet capture from wireshark for the initial negotiation between two nodes using the js-libp2p implementation (from the js-libp2p echo example, with some of the options for secio disabled and using the spdy muxer instead of libp2p-multiplex): |
@plintx Hey, There shouldnt be any need to do a packet capture for this. Is there a certain specification youre looking for? cc @diasdavid |
@whyrusleeping |
@plintx Happy to go through more details :) The best way to implement libp2p today is by doing module by module. In https://github.com/libp2p/js-libp2p/tree/master/examples you have multiple examples that show you from attaching a transport, to attaching stream multiplexing, protocol multiplexing, crypto channel and discovery. You can use these examples to enable only a subset of the features so that you can implement your own version incrementally. |
The packet capture was pretty useful in understanding the concurrent behaviour and the encoding that happens at line level. It essentially visualised the spec. Also it showed different things happening between js libp2p and go libp2p, which makes sense since they are using different protocols. |
@diasdavid Yes, thanks for providing those examples, very useful, and so easy to switch between muxers and transports. Actually, I was wondering about server push: is this feature used in libp2p implementations atm? It seems like there could be some interesting use cases for it. |
No longer using libp2p we've ended up building our own P2P protocols with wireguard, UTP, grpc and custom NAT traversal. We are however using CID spec in a number of projects now. Still not sure if CIDs should be used for identifying nodes themselves in a P2P network. |
The text was updated successfully, but these errors were encountered: