-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing debug_assert in Request-Response protocol #4773
Comments
We've landed some big internal changes to Can you retest with latest master whether or not that still happens? |
I am still able to reproduce the error on master. Here is a branch with a reproducer against master. To repro run from within the
and
Logs:
|
I took a stab at a fix here #4777 open to other ideas for how to fix this. |
Hi, I am facing the same errors when running our Filecoin Rust client in debug mode (using |
Have you tried re-ordering your behaviours to have all connection management ones come first? See #4777 (comment). |
Summary
I am hitting panics in debug mode at this debug_assert.
While the debug_assert is panic in request-response code, I do NOT think is a bug in request-response. I took a look at it and it seems to be related to a combination of factors.
I made a smallish reproducer by hacking the
file-sharing
example on this branch. I added themdns
behaviour so that I can easily get multiple inflight dial requests. I added thelibp2p_connection_limits
behaviour so that its easy to deny some of those requests. I set the max per peer limit to 2.Finally to reproduce I run these commands from the
example/file-sharing
directory:Provider
Note using
/ip4/0.0.0.0/
binds to multiple listen addresses. I have 4-5 local interfaces that mean I listen on that many addresses. This means that when mdns discovers a peer it attempts to dial each address separately and we attempt to dial them as separate concurrent dial requests.Retriever
It seems that the request-response behaviour is receiving an inconsistent set of events. It receives both a call to
handle_established_outbound_connection
and a swarm eventDialFailure
for the same connection id. This breaks the assumptions of the request-response internal state management and triggers the debug_assert failure.Expected behavior
I expect debug_asserts to not panic. Additionally I expect that for a single connection id it is either considered failed or established but not both.
Actual behavior
The debug_asserts panics because the request-response logic receives conflicting information about a connection.
Relevant log output
Possible Solution
One solution is to make request-response more robust to these inconsistent events from the swarm but that seems like a fix that only hides the real problem.
Version
v0.52.4
The reproducer branch is based off the
libp2p-v0.52.4
tag.Would you like to work on fixing this bug ?
Yes
The text was updated successfully, but these errors were encountered: