Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi: query chan update timestamps #8030

Merged
merged 14 commits into from
Dec 11, 2023

Conversation

ellemouton
Copy link
Collaborator

@ellemouton ellemouton commented Sep 25, 2023

In this PR, we start supporting the timestamps query option in the query_channel_range message. If we receive such a message with that bit set then we now populate the timestamps field of the reply_channel_range message.

With this feature we can now ask for the timestamps for the latest channel updates for a channel to be sent along with the SCID. Thus, if we have marked one of the channels in the list as a zombie but we see that the timestamp sent with it make it not a zombie, then we request the channel_update messages for that channel so that we can perhaps awaken the channel.

How to test?

See this comment: #8030 (comment)

Fixes #7989

@ellemouton ellemouton force-pushed the queryChanUpdateTimestamps branch from afc8971 to d922d67 Compare September 25, 2023 13:10
@saubyk saubyk added this to the v0.18.0 milestone Sep 25, 2023
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Main comment re if we should attempt to sanity check the timestamps we get from the remote party.

One way to test this out in the wild would be: create a new node on mainnet, no channels, set the zombie prune horizon to something short like 6 hrs, run this patch, and observe the zombie resurrection happen or not. We might want to add some extra logging statements wehn we realize we need to go fetch a zombie chan due to getting an update.

lnwire/query_channel_range.go Show resolved Hide resolved
// Encoding is an enum-like type that represents exactly how a set data is
// encoded on the wire. The set of encodings allows us to take advantage of the
// structure of the given list to achieving a high degree of compression.
type Encoding uint8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScidEncoding? As not a generic encoding that'll allow say proto buf instead of tlv or w/e.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that it isnt only used for Scid's though. It just defines how a wire message field type is encoded.

See in the commit named lnwire: add timestamps to ReplyChannelRange msg how it is used to maybe encode timestamps.

I could rename to ByteEncoding if that helps? just went for the terminology used in bolt 7

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think Encoding seems too generic. Some other ideas:

  • SortedEncoding
  • QueryEncoding

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went with QueryEncoding 👍

lnwire/timestamps.go Outdated Show resolved Hide resolved
lnwire/timestamps.go Outdated Show resolved Hide resolved
channeldb/graph.go Show resolved Hide resolved
channeldb/graph.go Outdated Show resolved Hide resolved
discovery/gossiper.go Outdated Show resolved Hide resolved
routing/router.go Outdated Show resolved Hide resolved
routing/router.go Outdated Show resolved Hide resolved
isZombie, _, _ := isZombieEdge(
zombieIndex, scid,
)
if isZombie && isZombieChan(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, this makes sense, so scratch the idea about the cache I had. I had envisioned a look up here each time to double check our timestamp with theirs.

Thinking about it a bit more, perhaps we should do some sort of sanity check? Otherwise, they can send bogus timestamps and make us request everything all over again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would such a sanity check work? I imagine a malicious peer could trivially defeat any check by setting timestamps to a recent time.

@saubyk saubyk added gossip graph wire protocol Encoding of messages for the communication between nodes labels Oct 28, 2023
Copy link
Collaborator

@Crypt-iQ Crypt-iQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still in the process of reviewing, but I think the chunkSize needs to be updated if timestamps are sent because instead of room for ~8000 scid's, there's only room for ~4000 because of the 2 4-byte timestamps

@ellemouton ellemouton force-pushed the queryChanUpdateTimestamps branch 2 times, most recently from 481d366 to 65062c4 Compare November 8, 2023 12:35
@ellemouton
Copy link
Collaborator Author

Still in the process of reviewing, but I think the chunkSize needs to be updated if timestamps are sent because instead of room for ~8000 scid's, there's only room for ~4000 because of the 2 4-byte timestamps

Great catch - thanks! Addressed 👍

Main comment re if we should attempt to sanity check the timestamps we get from the remote party.

👍 for now i've added a place holder that currently only checks that the times are not in the future. Will ponder more over if more can be done & if it is worth doing something more complicated like checking that what they end up sending matches the timestamps they claimed to have.

One way to test this out in the wild would be: create a new node on mainnet, no channels, set the zombie prune horizon to something short like 6 hrs, run this patch, and observe the zombie resurrection happen or not. We might want to add some extra logging statements wehn we realize we need to go fetch a zombie chan due to getting an update.

Good idea 👍 I'll do this next

@ellemouton ellemouton force-pushed the queryChanUpdateTimestamps branch 2 times, most recently from 2034c28 to 24dcb3a Compare November 8, 2023 13:31
@ellemouton ellemouton force-pushed the queryChanUpdateTimestamps branch 9 times, most recently from bd51d8c to e3aa151 Compare November 13, 2023 12:43
@ellemouton
Copy link
Collaborator Author

Did the following manual testing:

Confirmed issue:

  1. Set up a regtest env where there are a bunch of chans but main node, alice, has no chans but learns of all the other channels.
  2. Set alice's ChannelPruneExpiry and GraphPruneInterval to low values so that the node quickly marks all the other channels as zombies. (a quick way to verify this is to do a describegraph and see that there are no edges.
  3. Shutdown alice
  4. now perform some channel updates on the existing channels so that there are some ChannelUpdates that alice misses while the node is offline.
  5. Start alice , connect alice to a node in the network & see that Alice ignores all the SCIDs in the lnwire.ReplyChannelRange message due to them all being marked as zombies. (describegraph still empty).

PR fixes issue:

Restart alice with this PR and perform the above steps. At step 5, alice will mark the channels as alive and will re-add them to her graph.

@ellemouton
Copy link
Collaborator Author

This is ready for another round (see above comment for how to test) - the final thing I need to ponder on is how strict we really need to be with the timestamps they send us.

@saubyk
Copy link
Collaborator

saubyk commented Nov 13, 2023

...
5. Start alice , connect alice to a node in the network & see that Alice ignores all the SCIDs in the lnwire.ReplyChannelRange message due to them all being marked as zombies. (describegraph still empty).

PR fixes issue:

Restart alice with this PR and perform the above steps. At step 5, alice will mark the channels as alive and will re-add them to her graph.

Great explanation of the test 🤩

Concept ACK

@ellemouton ellemouton force-pushed the queryChanUpdateTimestamps branch from e3aa151 to f33cd7e Compare November 14, 2023 09:07
discovery/syncer.go Outdated Show resolved Hide resolved
// Encoding is an enum-like type that represents exactly how a set data is
// encoded on the wire. The set of encodings allows us to take advantage of the
// structure of the given list to achieving a high degree of compression.
type Encoding uint8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think Encoding seems too generic. Some other ideas:

  • SortedEncoding
  • QueryEncoding

lnwire/encoding.go Outdated Show resolved Hide resolved
lnwire/timestamps.go Outdated Show resolved Hide resolved
lnwire/timestamps.go Outdated Show resolved Hide resolved
lnwire/timestamps.go Outdated Show resolved Hide resolved
lnwire/reply_channel_range.go Outdated Show resolved Hide resolved
lnwire/reply_channel_range.go Show resolved Hide resolved
lnwire/reply_channel_range_test.go Outdated Show resolved Hide resolved
channeldb/graph.go Show resolved Hide resolved
@ellemouton ellemouton force-pushed the queryChanUpdateTimestamps branch 7 times, most recently from bd50f11 to 2bebc49 Compare November 22, 2023 11:17
@ellemouton ellemouton requested a review from morehouse November 22, 2023 13:11
Copy link
Collaborator

@morehouse morehouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

channeldb/graph.go Show resolved Hide resolved
This commit adds a new QueryOptions type which will later be used in the
QueryReplyRange message.
Since the the encoding can be used for multiple different fields, we
rename it here to be more generic.
This is for later when the timestamps also need to be sorted according
to the SCIDs.
The `edgeIndexBucket` key name is used to key a sub-bucket within the
`edgeBucket` bucket. And so the `edgeIndexBucket` is not a top level
bucket.
This commit also adds a new `protocol.no-timestamp-query-option` option
to disable the new behaviour.
This commit is a set-up commit. It extracts the logic from
`MarkEdgeLive` to a helper `markEdgeLive` method so that the logic can
be called from within other kvdb.Update blocks. This will be used in the
next commit.
@ellemouton ellemouton force-pushed the queryChanUpdateTimestamps branch from 2bebc49 to 645eb44 Compare December 11, 2023 07:13
@lightninglabs-deploy
Copy link

@Roasbeef: review reminder

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🌲

@Roasbeef Roasbeef merged commit 2fee3f6 into lightningnetwork:master Dec 11, 2023
24 of 25 checks passed
@ellemouton ellemouton deleted the queryChanUpdateTimestamps branch December 11, 2023 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gossip graph wire protocol Encoding of messages for the communication between nodes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lnwire+discovery: implement gossip query timestamp extensions
6 participants