Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core)!: make ConnectionIds globally unique #3327

Merged
merged 16 commits into from
Jan 23, 2023

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented Jan 16, 2023

Description

Instead of offering a public constructor, users are now no longer able to construct ConnectionIds at all. They only public API exposed are the derived traits. Internally, ConnectionIds are monotonically incremented using a static atomic counter, thus no two connections will ever get assigned the same ID.

Notes

Extracted out of #3254.

Links to any relevant issues

Open Questions

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@thomaseizinger thomaseizinger requested review from jxs and mxinden January 16, 2023 01:34
@thomaseizinger thomaseizinger marked this pull request as ready for review January 16, 2023 01:36
@thomaseizinger thomaseizinger changed the base branch from master to no-run-title-as-command January 16, 2023 10:42
core/src/connection.rs Outdated Show resolved Hide resolved
core/src/connection.rs Outdated Show resolved Hide resolved
@thomaseizinger thomaseizinger marked this pull request as draft January 17, 2023 03:25
@thomaseizinger
Copy link
Contributor Author

Converting to draft until #3318 is merged to prevent accidental merges.

@thomaseizinger
Copy link
Contributor Author

This is ready for review despite being draft. Only #3318 needs to merge first.

Base automatically changed from no-run-title-as-command to master January 17, 2023 22:13
@thomaseizinger thomaseizinger marked this pull request as ready for review January 18, 2023 01:13
@mergify
Copy link
Contributor

mergify bot commented Jan 18, 2023

This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏

Comment on lines +60 to +67
/// A "dummy" [`ConnectionId`].
///
/// This is primarily useful for creating connection IDs
/// in test environments. There is in general no guarantee
/// that all connection IDs are based on non-negative integers.
pub fn new(id: usize) -> Self {
Self(id)
}
}

impl std::ops::Add<usize> for ConnectionId {
type Output = Self;

fn add(self, other: usize) -> Self {
Self(self.0 + other)
/// Really, you should not use this, not even for testing but it is here if you need it.
#[deprecated(
since = "0.42.0",
note = "Don't use this, it will be removed at a later stage again."
)]
pub const DUMMY: ConnectionId = ConnectionId(0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very happy about this but I propose to accept this technical debt to be able to move forward with the connection management PR.

Long-term, we would need to re-write the gossipsub and kademlia tests to not manually call lifecycle functions and thus only have Swarm construct ConnectionIds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@libp2p/rust-libp2p-maintainers Please voice your opinion on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it would be nice if this was possible. Other than that I don't see a better alternative for here.

use std::task::Waker;
use std::time::Duration;
use std::{fmt, io, mem, pin::Pin, task::Context, task::Poll};

static NEXT_CONNECTION_ID: AtomicUsize = AtomicUsize::new(1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We start IDs at 1 so they can never clash with the DUMMY one below.

Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +60 to +67
/// A "dummy" [`ConnectionId`].
///
/// This is primarily useful for creating connection IDs
/// in test environments. There is in general no guarantee
/// that all connection IDs are based on non-negative integers.
pub fn new(id: usize) -> Self {
Self(id)
}
}

impl std::ops::Add<usize> for ConnectionId {
type Output = Self;

fn add(self, other: usize) -> Self {
Self(self.0 + other)
/// Really, you should not use this, not even for testing but it is here if you need it.
#[deprecated(
since = "0.42.0",
note = "Don't use this, it will be removed at a later stage again."
)]
pub const DUMMY: ConnectionId = ConnectionId(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it would be nice if this was possible. Other than that I don't see a better alternative for here.


/// Returns the next available [`ConnectionId`].
pub(crate) fn next() -> Self {
Self(NEXT_CONNECTION_ID.fetch_add(1, Ordering::SeqCst))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very elegant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credits go to @rkuhn !

@mergify
Copy link
Contributor

mergify bot commented Jan 19, 2023

This pull request has merge conflicts. Could you please resolve them @thomaseizinger? 🙏

@thomaseizinger
Copy link
Contributor Author

Thanks for the review @jxs! @mxinden I'd like you to have a look at this too before I merge it.

mxinden added a commit to mxinden/rust-libp2p that referenced this pull request Jan 23, 2023
Note that this is only really valid once we have libp2p#3327 and
the corresponding patch in `libp2p-swarm` `DialOpts`.
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with the technical dept. I suggest moving forward here.

@mxinden
Copy link
Member

mxinden commented Jan 23, 2023

rust-libp2p-head x rust-libp2p-head is failing. Error seems to be:

   rust-libp2p-headxrust-libp2p-headwebrtc-dialer-1    | [2023-01-23T12:07:47Z WARN  webrtc::peer_connection::peer_connection_internal] failed to open data channel: there already exists a stream with identifier
  rust-libp2p-headxrust-libp2p-headwebrtc-listener-1  | Error: Failed to wait for dialer conclusion

@thomaseizinger
Copy link
Contributor Author

rust-libp2p-head x rust-libp2p-head is failing. Error seems to be:

   rust-libp2p-headxrust-libp2p-headwebrtc-dialer-1    | [2023-01-23T12:07:47Z WARN  webrtc::peer_connection::peer_connection_internal] failed to open data channel: there already exists a stream with identifier
  rust-libp2p-headxrust-libp2p-headwebrtc-listener-1  | Error: Failed to wait for dialer conclusion

Isn't that the flake that came up a couple of times now?

@jxs
Copy link
Member

jxs commented Jan 23, 2023

rust-libp2p-head x rust-libp2p-head is failing. Error seems to be:

   rust-libp2p-headxrust-libp2p-headwebrtc-dialer-1    | [2023-01-23T12:07:47Z WARN  webrtc::peer_connection::peer_connection_internal] failed to open data channel: there already exists a stream with identifier
  rust-libp2p-headxrust-libp2p-headwebrtc-listener-1  | Error: Failed to wait for dialer conclusion

Isn't that the flake that came up a couple of times now?

yup exactly

@mergify mergify bot merged commit 778f7a2 into master Jan 23, 2023
@mergify mergify bot deleted the 2824-unique-connection-ids branch January 23, 2023 16:29
@thomaseizinger
Copy link
Contributor Author

I am tracking removal of the tech-debt in #3371.

mergify bot pushed a commit that referenced this pull request Jan 25, 2023
…r` (#3384)

The gossipsub tests are calling lifecycle functions of the `NetworkBehaviour` that aren't meant to be called outside of `Swarm`. This already surfaced as a problem in #3327 and it is coming up again in #3254 where `new_handler` gets deprecated.

Try to mitigate that by constructing a dummy handler instead. Functionally, there is no difference as in both cases, the given handler has never seen a connection.
jxs pushed a commit to jxs/rust-libp2p that referenced this pull request Jan 25, 2023
…r` (libp2p#3384)

The gossipsub tests are calling lifecycle functions of the `NetworkBehaviour` that aren't meant to be called outside of `Swarm`. This already surfaced as a problem in libp2p#3327 and it is coming up again in libp2p#3254 where `new_handler` gets deprecated.

Try to mitigate that by constructing a dummy handler instead. Functionally, there is no difference as in both cases, the given handler has never seen a connection.
@melekes
Copy link
Contributor

melekes commented Mar 15, 2023

A bit late to the party, but we (@paritytech) rely on being able to construct ConnectionId in our tests - https://github.com/paritytech/substrate/blob/master/client/network/src/protocol/notifications/behaviour.rs#L3391

Would you be willing to change your opinion on making ConnectionId constructor private?

@thomaseizinger
Copy link
Contributor Author

A bit late to the party, but we (@paritytech) rely on being able to construct ConnectionId in our tests - paritytech/substrate@master/client/network/src/protocol/notifications/behaviour.rs#L3391

Would you be willing to change your opinion on making ConnectionId constructor private?

We do too in some tests but my current position is that you shouldn't call those lifecycle functions yourself but instead construct two or more Swarms and establish a connection between them, possible using the new libp2p-swarm-test: https://github.com/libp2p/rust-libp2p/tree/master/swarm-test

Does that work for you?

@mxinden I think we should publish libp2p-swarm-test to crates.io.

@melekes
Copy link
Contributor

melekes commented Mar 17, 2023

@mxinden I think we should publish libp2p-swarm-test to crates.io.

please do 🙏

We do too in some tests but my current position is that you shouldn't call those lifecycle functions yourself but instead construct two or more Swarms and establish a connection between them, possible using the new libp2p-swarm-test: https://github.com/libp2p/rust-libp2p/tree/master/swarm-test
Does that work for you?

There are over 2000 lines of tests, but I think it may be possible to rewrite them using Swarms.

@melekes
Copy link
Contributor

melekes commented Mar 17, 2023

In some tests we need to inject custom handshake and I'm worried it won't be possible with libp2p-swarm-test API

notif.on_connection_handler_event(
			peer,
			conn1,
			conn_yielder.open_substream(peer, 0, connected.clone(), vec![1, 2, 3, 4]),
		);

Overall this change is quite upsetting because we just want to test our state machine built on top of libp2p so spawning libp2p Swarms under it is overcomplicating the issue

@thomaseizinger
Copy link
Contributor Author

Overall this change is quite upsetting because we just want to test our state machine built on top of libp2p so spawning libp2p Swarms under it is overcomplicating the issue

Thank you for sharing.

We can re-introduce a public constructor for ConnectionId.

However, I think these whitebox tests are not a good idea. For example, recently we introduced callbacks for connection management that also involves the behaviour upon pending connections. Is your home-grown test framework calling these correctly?

Going through Swarm for your tests is going to simulate a real network much better than calling these lifecycle functions yourself.

@melekes
Copy link
Contributor

melekes commented Mar 21, 2023

Is your home-grown test framework calling these correctly?

Probably not, which is why I agree that we should use Swarms as much as possible. BUT at the same time

a) we want to ship 0.51.1 libp2p, which is currently blocked on this
b) rewriting 2000 lines of tests will take some time
c) libp2p-swarm-test is not even published.

I'd humbly propose to reintroduce ConnectionId::new and mark it as deprecated (it was not marked as deprecated before?). This will give us some time to rewrite the tests. Thank you 🙏

@thomaseizinger
Copy link
Contributor Author

c) libp2p-swarm-test is not even published.

We are working on that:

I'd humbly propose to reintroduce ConnectionId::new and mark it as deprecated (it was not marked as deprecated before?). This will give us some time to rewrite the tests. Thank you pray

Yep, we can do that. I'll send a PR.

@thomaseizinger
Copy link
Contributor Author

I'd humbly propose to reintroduce ConnectionId::new and mark it as deprecated (it was not marked as deprecated before?). This will give us some time to rewrite the tests. Thank you pray

Yep, we can do that. I'll send a PR.

Actually, with how we have now modeled DialOpts, there isn't any risk in users creating an already used ConnectionId as part of dialing a peer.

mergify bot pushed a commit that referenced this pull request Mar 21, 2023
In earlier iterations of the design for generic connection management, we removed the `ConnectionId::new` constructor because it would have allowed users to create `ConnectionId`s that are already taken, thus breaking invariants that `NetworkBehaviour`s rely on. Later, we incorporated the creation of `ConnectionId` in `DialOpts` which mitigates this risk altogether.

Thus, it is reasonably safe to introduce a public, non-deprecated constructor for `ConnectionId` that can be used for tests.

Related #3327 (comment).

Pull-Request: #3652.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants