-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor WebRTC code #51
Conversation
Introduce `OpeningWebRtcConnection` which only deals with handling the initial Noise handsdhake which starts every WebRTC connection. When `str0m` receives an inbound connection from a new peer, an `OpeningWebRtcConnection` object is created which tracks the Noise handshake state of the connection. This object doesn't have its own asynchronous event loop and is polled by the `WebRtcTransport` object. Once the remote `PeerId` has been received in the second Noise message, `TransportManager` will be notified of the new connection which it validates and either accepts or reject. If the connection is rejected, all connection state is discarded. If the connection is accepted, `OpeningWebRtcConnection` sends the third and final Noise handshake message, fully opening the connection. After that, the object is destroyed and a `WebRtcConnection` object is crated using the `Rtc` object that was held by `OpeningWebRtcConnection`. This connection object has its own asynchronous event loop like connection loops of other transports. Currently the `WebRtcConnection` is just a dummy connection which polls the connection and negotiates protocols for inbound substreams but does nothing else. Proper connection support will be implemented in future commits.
Introduce `encode_multistream_message()` utility function for crafting `multistream-select` messages, correctly detect invalid messages and improve test coverage of the function.
Add support for fallback protocols and increase test coverage
The new event loop of `WebRtcConnection` only deals with opening, accepting and negotiating substreams.
Since Versi was busy with Kusama tests, I spoke with Andre and we decided that I should look into WebRTC in the meantime, even though it was technically your task. This is a big PR so I'm not expecting a comprehensive review but it'll (kind of) make it work so it should be relatively straightforward to fix any further issues. |
What this means for us? I.e., is this causing any issues for substrate/lightp2p, or we can happily live without these issues fixed in |
It means that outbound substreams don't work very well which is a blocker insofar as releasing WebRTC goes. Code needed to fix those issues are quite trivial so I'll open a pull request to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not a thorough review as this is all new stuff for me, and I'm far from understanding all the details. Having said that, looks good.
let message = encode_multistream_message( | ||
std::iter::once(protocol.clone()) | ||
.chain(fallback_names.clone()) | ||
.filter_map(|protocol| Protocol::try_from(protocol.as_ref()).ok()) | ||
.map(|protocol| Message::Protocol(protocol)), | ||
)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably don't understand something, but as per multistream-select spec, protocols are tried one-by-one. Why are we encoding all protocols (including fallback protocols) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to check this but this works with smoldot, both for listener and dialer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It looks like an incorrect way of handling the handshake and litep2p would reject this message as well but smoldot both sends and accepts Message::Protocols
. I'm kind of tempted to keep it as is since it works but I'll check if smoldot would work with Message::Protocol
This reverts commit 0ca33be. It turned out to be a bad idea to apply major code style changes while having unmerged PRs. Specifically, in #51 I hit the all file long merge conflicts where it's impossible to manually track important changes and merge them. Applying the new code style to the PR itself also didn't help — git failed to correctly handle the changed parts and this led to even more merge conflicts. So, the plan is to revert the code style changes, apply the pending PRs, and reintroduce the code style changes.
50c9ed2
to
8f189ab
Compare
@@ -0,0 +1,80 @@ | |||
// Copyright 2023 litep2p developers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: At some point I think we'd need to change this to paritytech
// Copyright 2023-2024 Parity Technologies (UK) Ltd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can gradually append this line to modified files. I think we should also keep the original copyright that basically refers to Aaro, as litep2p started as his side project.
let config = Litep2pConfigBuilder::new() | ||
.with_keypair(Keypair::generate()) | ||
.with_webrtc(Config { | ||
listen_addresses: vec!["/ip4/192.168.1.170/udp/8888/webrtc-direct".parse().unwrap()], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The block-announces protocol uses e143f23803ac50e8f6f8e62695d1ce9e4e1d68aa36c1cd2cfd15340213f3423e
which is the genesis of Westend, however I cannot connect to this addr.
Is there another way to test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The listen address should be updated to 127.0.0.1. Or does the WebRTC spec impose the limitations of connecting to loopback addresses?
Then, I guess, for this test to succeed another Westend node with litep2p backend is needed configured to dial this listen address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking into account the noise prologue comment, it looks like smoldot is needed as a second node for this test to succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test was used to see how litep2p works with smoldot. It doesn't really test anything which is why it was disabled. How it is you run the test RUST_LOG=info cargo test webrtc_test -- --ignored
, copy the printed local address (127.0.0.1 didn't work for some reason) and go to https://tomaka.github.io/test-browser-node/index.html
WebRTC testing branch is still active (commit), if you want to test WebRTC with Polkadot. If you only test smoldot with litep2p it will terminate the connection pretty soon because the syncing handshake is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @altonen for the explanation!
}; | ||
|
||
#[tokio::test] | ||
#[ignore] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test is ignored because the IP address is not reachable anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably is intended to be run manually with a companion node.
Vec::with_capacity(PREFIX.len() + local_fingerprint.len() + remote_fingerprint.len()); | ||
prologue.extend_from_slice(PREFIX); | ||
prologue.extend_from_slice(&remote_fingerprint); | ||
prologue.extend_from_slice(&local_fingerprint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this is <prefix><fingerprint A><fingerprint B>
, depending on whatever this is the handshake responder or the initiator.
(extracted from: https://github.com/libp2p/specs/blob/master/webrtc/webrtc-direct.md#connection-security)
Do we need to handle the case where we are the responder? Ie <prefix><local><remote>
instead of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is it's based on the role of smoldot in the WebRTC noise handshake. It looks like we don't support litep2p to litep2p WebRTC connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR only implemented WebRTC Direct (browser <-> server), not server <-> server.
*self.state.lock() = State::SendClosed; | ||
} | ||
|
||
if flags & 2 == Flag::ResetStream as i32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dq: We don't care about Flag::FinAck
because we are closing the connection either way? (https://github.com/libp2p/specs/blob/master/webrtc/README.md#shared-concepts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we apply binary &
, even though flags seem to be exclusive and not a bitmask. Also, this way FinAck
is handled by conditions flags & 1 == StopSending
and flags & 2 == ResetStream
, so this is probably the reason we don't have a separate condition for FinAck
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would merge as is and refactor this in a follow-up PR.
@@ -36,7 +36,7 @@ simple-dns = "0.5.3" | |||
smallvec = "1.10.0" | |||
snow = { version = "0.9.3", features = ["ring-resolver"], default-features = false } | |||
socket2 = { version = "0.5.5", features = ["all"] } | |||
str0m = "0.2.0" | |||
str0m = "0.4.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str0m = "0.4.1" | |
str0m = "0.5.0" |
The str0m
blocker has been fixed (algesten/str0m@7ed5143) and is included in the 0.5.0 release. I think we should be using the latest version since it includes a few other fixes https://github.com/algesten/str0m/blob/main/CHANGELOG.md#050
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on my limited view of the WebRTC 👍
There are a few breaking changes when updating to str0m 0.5, we could probably handle those in a later PR, since some of the changes are not entirely trivial
Co-authored-by: Alexandru Vasile <[email protected]>
## [0.5.0] - 2023-05-24 This is a small patch release that makes the `FindNode` command a bit more robst: - The `FindNode` command now retains the K (replication factor) best results. - The `FindNode` command has been updated to handle errors and unexpected states without panicking. ### Changed - kad: Refactor FindNode query, keep K best results and add tests ([#114](#114)) ## [0.4.0] - 2023-05-23 This release introduces breaking changes to the litep2p crate, primarily affecting the `kad` module. Key updates include: - The `GetRecord` command now exposes all peer records, not just the latest one. - A new `RecordType` has been introduced to clearly distinguish between locally stored records and those discovered from the network. Significant refactoring has been done to enhance the efficiency and accuracy of the `kad` module. The updates are as follows: - The `GetRecord` command now exposes all peer records. - The `GetRecord` command has been updated to handle errors and unexpected states without panicking. Additionally, we've improved code coverage in the `kad` module by adding more tests. ### Added - Add release checklist ([#115](#115)) - Re-export `multihash` & `multiaddr` types ([#79](#79)) - kad: Expose all peer records of `GET_VALUE` query ([#96](#96)) ### Changed - multistream_select: Remove unneeded changelog.md ([#116](#116)) - kad: Refactor `GetRecord` query and add tests ([#97](#97)) - kad/store: Set memory-store on an incoming record for PutRecordTo ([#88](#88)) - multistream: Dialer deny multiple /multistream/1.0.0 headers ([#61](#61)) - kad: Limit MemoryStore entries ([#78](#78)) - Refactor WebRTC code ([#51](#51)) - Revert "Bring `rustfmt.toml` in sync with polkadot-sdk (#71)" ([#74](#74)) - cargo: Update str0m from 0.4.1 to 0.5.1 ([#95](#95)) ### Fixed - Fix clippy ([#83](#83)) - crypto: Don't panic on unsupported key types ([#84](#84)) --------- Signed-off-by: Alexandru Vasile <[email protected]>
This PR refactors the WebRTC code into something that actually works. The
WebRtcConnection
code is split intoOpeningWebRtcConnection
andWebRtcConnection
. The former only deals with the Noise handshake whereas the latter implements a typical connection event loop, dealing with inbound/outbound substreams and data exchange. The PR also implements a new substream type for WebRTC which is implemented usingtokio::sync::mpsc::{Sender, Receiver}
sincestr0m
doesn't expose a substream type. Themultistream-select
code used by WebRTC is refactored and its test coverage is increased.This PR also updates
str0m
to the latest version but during testing it was discovered that there are issues withChannelId
/SCTP stream id reuse and those have to be reported to upstream and fixed. They're are not blockers for merging this PR though because the fixes should be contained tostr0m
and for litep2p it would only be a version bump if the fixes are accepted.The WebRTC implementation (with pending fixes to
str0m
) has been tested with smoldot and the code works. IPFS Ping, Kademlia and Identify all work, opening and negotiating notification protocols work and block requests work. There are, however, two issues that I was able to find:It looks like smoldot is still able to receive block announcements and GRANDPA notifications so I'm not entirely what the issue here is.
This happens on the first try, gossip is closed, then it's opened again and it's able to start warp sync.