Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p/discover: fix update logic in handleAddNode #29836

Merged
merged 12 commits into from
May 28, 2024

Conversation

lightclient
Copy link
Member

@lightclient lightclient commented May 24, 2024

It seems the semantic differences between addFoundNode and addInboundNode were lost in
#29572. My understanding is addFoundNode is for a node you have not contacted directly
(and are unsure if is available) whereas addInboundNode is for adding nodes that have
contacted the local node and we can verify they are active.

handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in
the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes
this. It wasn't originally caught in tests like TestTable_addSeenNode because the
manipulation of the node object actually modified the node value used by the test.

New logic is added to reject non-inbound updates unless the sequence number of the
(signed) ENR increases. Inbound updates, which are published by the updated node itself,
are always accepted. If an inbound update changes the endpoint, the node will be
revalidated on an expedited schedule.

@fjl
Copy link
Contributor

fjl commented May 24, 2024

It's better to store *enode.Node because that object is immutable. We just need to protect the access to the field node.Node.

@fjl
Copy link
Contributor

fjl commented May 24, 2024

Actually, I think this problem will go away with the refactoring where node is only used in table. The race happens because we return *nodesByDistance from Table.findnodeByID, which contains *node. When we access the inner node later, there is a race with the table updating it.

@fjl
Copy link
Contributor

fjl commented May 24, 2024

handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in the bucket (updating it's IP addr) even if the node was not an inbound.

I don't think it's a problem. bumpInBucket is safe to invoke even for non-inbound nodes.

@lightclient
Copy link
Member Author

Is it correct to update the IP of non inbound add? This cause the test TestTable_addSeenNode to fail.

@fjl
Copy link
Contributor

fjl commented May 25, 2024

I've thought about it a bunch more, and it used to be correct to not update the endpoint for found nodes, especially since the endpoint information is not authenticated in discv4. However, it's different in discv5 with ENRs. If we find a newer record, we want to put it into the table. I think we should resolve this by adding a check for the sequence number. When we encounter a newer record, update it, regardless of req.isInbound. As a special case we can also update the endpoint if req.isInbound && node.Seq() == 0 for discv4.

@fjl
Copy link
Contributor

fjl commented May 25, 2024

Pretty happy you looked into this because I didn't see these issues.

@Mazzika1
Copy link

Thank you

@lightclient
Copy link
Member Author

Sure! I can update it.

@fjl fjl changed the title p2p/discover: copy enode in node wrapper to avoid race p2p/discover: fix update logic in handleAddNode May 26, 2024
@fjl
Copy link
Contributor

fjl commented May 26, 2024

Node refactoring is here #29844. It should fix the race, and this PR can just be about the handleAddNode thing.

@lightclient
Copy link
Member Author

Updated.

// it is allowed to update its own entry.
n = b.entries[i]
isUpdate := newRecord.Seq() > n.Seq()
isDiscv4Update := n.Seq() == 0 && newRecord.Seq() == 0 && isInbound
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For discv5 the message has already been authenticated by now, correct? So that means only the actual node could double sign sequence 0?

Is it an issue that now in discv5 inbound contacts can update their endpoint by replaying sequence 0?

Copy link
Contributor

@fjl fjl May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's an issue. We could always be even stricter and check if it's an unsigned record. However, that seems excessive. We previously accepted all inbound updates, even ones with a lower sequence number, and in fact this is something I am still considering to add back. Endpoint information provided by the node itself can probably always be considered valid.

Copy link
Contributor

@fjl fjl May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I'm still exploring here is the interaction with revalidation. If a node changes endpoint via this update path, we should also move it to the fast reval queue again. This requires rethinking the queue management a bit.

@fjl fjl force-pushed the discover-race-fix branch from a7101a1 to 3148497 Compare May 28, 2024 16:54
@fjl fjl merged commit cc22e0c into ethereum:master May 28, 2024
2 of 3 checks passed
@fjl fjl added this to the 1.14.4 milestone May 28, 2024
jorgemmsilva pushed a commit to iotaledger/go-ethereum that referenced this pull request Jun 17, 2024
It seems the semantic differences between addFoundNode and addInboundNode were lost in
ethereum#29572. My understanding is addFoundNode is for a node you have not contacted directly
(and are unsure if is available) whereas addInboundNode is for adding nodes that have
contacted the local node and we can verify they are active.

handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in
the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes
this. It wasn't originally caught in tests like TestTable_addSeenNode because the
manipulation of the node object actually modified the node value used by the test.

New logic is added to reject non-inbound updates unless the sequence number of the
(signed) ENR increases. Inbound updates, which are published by the updated node itself,
are always accepted. If an inbound update changes the endpoint, the node will be
revalidated on an expedited schedule.

Co-authored-by: Felix Lange <[email protected]>
maoueh pushed a commit to streamingfast/go-ethereum that referenced this pull request Aug 1, 2024
It seems the semantic differences between addFoundNode and addInboundNode were lost in
(and are unsure if is available) whereas addInboundNode is for adding nodes that have
contacted the local node and we can verify they are active.

handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in
the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes
this. It wasn't originally caught in tests like TestTable_addSeenNode because the
manipulation of the node object actually modified the node value used by the test.

New logic is added to reject non-inbound updates unless the sequence number of the
(signed) ENR increases. Inbound updates, which are published by the updated node itself,
are always accepted. If an inbound update changes the endpoint, the node will be
revalidated on an expedited schedule.

Co-authored-by: Felix Lange <[email protected]>
stwiname pushed a commit to subquery/data-node-go-ethereum that referenced this pull request Sep 9, 2024
It seems the semantic differences between addFoundNode and addInboundNode were lost in
ethereum#29572. My understanding is addFoundNode is for a node you have not contacted directly
(and are unsure if is available) whereas addInboundNode is for adding nodes that have
contacted the local node and we can verify they are active.

handleAddNode seems to be the consolidation of those two methods, yet it bumps the node in
the bucket (updating it's IP addr) even if the node was not an inbound. This PR fixes
this. It wasn't originally caught in tests like TestTable_addSeenNode because the
manipulation of the node object actually modified the node value used by the test.

New logic is added to reject non-inbound updates unless the sequence number of the
(signed) ENR increases. Inbound updates, which are published by the updated node itself,
are always accepted. If an inbound update changes the endpoint, the node will be
revalidated on an expedited schedule.

Co-authored-by: Felix Lange <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants