-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chainfee: introduce filterManager and use it for fee floor #8418
chainfee: introduce filterManager and use it for fee floor #8418
Conversation
Important Auto Review SkippedAuto reviews are limited to the following labels: llm-review. Please add one of these labels to enable auto reviews. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
lnwallet/chainfee/estimator.go
Outdated
return minFee | ||
} | ||
|
||
if filterFee > minFee { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I wonder if we want to allow the user to override this if they really want to publish at that low fee. I guess ultimately, they can specify the sat_per_vbyte
field in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting sat_per_vbyte
wouldn't override this because most call-sites end up invoking EstimateFeePerKW
or comparing against RelayFeePerKW
. Both of these call into the filterManager
lnwallet/chainfee/filtermanager.go
Outdated
values []SatPerKWeight | ||
valuesMtx sync.RWMutex | ||
|
||
fetchFunc func(*rpcclient.Client) ([]SatPerKWeight, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can abstract the fetching further. Assuming there's some closure that holds the client, it just becomes a normal call back. Then we also don't need to worry about bitcoind or not below as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another route is making the fetchFunc
into an interface, then having one version for btcd
and the other for bitcoind
. Then when we make the filterManager
in the respective fee estimators, we pick the one based on the target backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it into a callback that holds the client, lmk what you think
continue | ||
} | ||
|
||
// The feefilter is already in units of sat/KvB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, didn't know we diverged there re displayed units.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, Core displays as a float
lnwallet/chainfee/filtermanager.go
Outdated
return false | ||
} | ||
|
||
// TODO: ignore values outside of a certain band (configurable?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in values above our max accepted fee?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I think we need some sort of ceiling, but I don't know what to base it off of...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be put inside PublishTransaction
, then every time we broadcast, we get the minfeefilter
values from peers. If we are going with the median approach, we need to kick out the outliers first, so I guess the median value is better, tho we could make this configurable, that we can use percentiles.
On the other hand, I think we want fresh data when publishing txns, because the minfeefilter
is only relevant by the time we send txns to our peers?
return median, nil | ||
} | ||
|
||
type bitcoindPeerInfoResp struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel like this should be part of chain
under btcwallet/chain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely.
+1 for configurability and use of percentiles. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...moving median compared to a moving average...
So there's actually two dimensions being measured here, and the PR as written uses the median for both of them. The way the code reads now, we first collapse our set of peer fee filters down to a median value and then we add it to the moving median window. So with that in mind, knowing that we use moving median/average to reduce time based jitter, what are we smoothing? Outlier malicious peers are already controlled for by the time we even get to the time series data because we take the median there. As such I'm not sure we need to use a moving median to control for it as any jitter that shows up in the time-series will inherently be jitter in the median node's report and thus more likely already reflect broader market conditions.
...exclusively using outbound peers' feefilter...
So thinking more about this, I understand that if we use outbound we have a bit more selectivity when it comes to who we trust to implicitly set our fees for us. The issue is that just because they are outbound connections doesn't mean that they won't be malicious. It does prevent drive-by's but simply knowing this someone could serve a diff fee filter for outbound connections vs inbound ones. However, let's suppose that it does, do we lose anything other than data points? I don't think so. So I think we can fix a lower sample count by just instructing bitcoind to have more outbound peers using addnode
and the like. Alternatively if that's too much effort we can always take up to the same number of samples and prefer the ones that are outbound over the ones that are inbound.
I do think that it's probably a good idea to have an escape hatch here as I don't believe bitcoind has a minimum count of nodes it is connected to outbound. So at minimum we probably need to handle the zero case.
...polling interval and the moving window...
The moving window size probably doesn't need to be very large as I expect that most of the relevant smoothing will come from cohort aggregation rather than time-series aggregation.
A quicker polling rate is probably better and we can either extend the window or time-bucketing to make it such that older data is at a lower resolution. That said, like I said in the above commentary, I don't think we need to get cute with the windowing calcs because I expect the important info to come from cohort aggregation.
Just realize Since this is a That being said, I don't think we are sending |
If we do this, we need a ceiling because o/w if a user sets this to 100% their LND node will set this to a high value and they will lose their money to fees.
This was laziness on my part, I didn't feel like adding more values to the moving window.
That's true, the only benefit here would be to smooth out any shocks. But as you say if there are shocks it reflects what the broader network is doing and we should probably react to them.
It is much harder for a malicious entity to be the majority of our outbound connections than it is for them to be the majority of our inbound connections (trivial). Assuming the user is running bitcoind, they would need to pollute our addrman table and hope that the next time we start up that their nodes are chosen for outbound connection. It's not impossible, but if we're going to use data from our peers it should be the outbound peers. This is another reason we should try to enforce sensible bounds on the data we receive.
For this to have any effect, a lot of nodes would have to be running this patch to serve different outbound feefilters. This would result in transaction relay failing for a lot of txn's. Also this type of coordination would probably not go unnoticed. For example, I have a script that would detect this sort of broader network behavior. Again this isn't impossible for somebody to pull off, but we should have sensible limits in place to mitigate any risk here. Deciding on what those will be might be tricky.
I don't follow, can you clarify?
We could do this, bitcoind will let us do 8 addnode connections (https://github.com/bitcoin/bitcoin/blob/5b8c5970bdfc817cac9b59f699925c4426c59b61/src/net.h#L1488). However, I think lnd instructing bitcoind how to behave is unprecedented for our codebase.
I don't think we should ever use inbound data
Yup I agree.
Our bitcoind peers will periodically send us feefilters (https://github.com/bitcoin/bitcoin/blob/5b8c5970bdfc817cac9b59f699925c4426c59b61/src/net_processing.cpp#L5463-L5472) so it makes sense to poll. |
I don't think we need polling for
If we don't do polling we won't have this issue. As of today, publishing a tx already involves two RPC calls,
We might as well call Re inbound and outbound peers, I find this and this posts helpful. In short, I agree with @Crypt-iQ that we should only use outbound peers. |
I was agreeing with you in that, I don't think there is a real negative consequence other than just smaller sample size if we go with outbound-only data.
Hmm yeah I agree that maybe this is a route we'd like to avoid. I do think if we are restricting to outbound only, we need to ensure we always have enough of those connections. Do we know bitcoind's behavior here?
ACK |
On second thought I think keeping the moving window is good because if we suddenly lose the majority of our outbound peers and only have one or two peers 1000 sats/KvB (about 10% of the network will return this), we'll potentially be unable to broadcast our transactions.
It attempts to keep 8 outbound connections by default. IIRC this limit can be raised |
If we don't adjust our min relay feerate ahead of time (i.e. before we need to go onchain), we won't have a commitment transaction that can actually relay. I think we really need to do polling so that we can do |
ACK. I don't think the configurable percentile idea should be implemented right now. Let's keep things simple and reduce configuration until we have a strong reason to do that. Median sounds like a good default -- 50% of our outbound peers are expected to relay our min fee transactions. Moving median seems good to smooth out spikiness caused by peer disconnects.
I agree outbound peers are preferred. Perhaps we should set a minimum number of data points we want (e.g., 3?). If we can't get that many data points from outbound connections, we consider some inbound connections to make up the difference.
5 minute polling interval seems reasonable to me. I wonder if we should reduce the size of the moving window though. A size of 25 would mean that we use the median relay feerate over the past ~13 blocks. If feerates happen to be consistently increasing, we could miss HTLC deadlines because our moving window was too slow to update. |
Every time we create transactions, when estimating fees, we already call
Could you elaborate a bit? If we only have two peers with 1000 sats/kvB, doesn't it mean the polling data won't be useful because the only peers we have won't accept our transactions?
The max number of outbound connections is independently limited by 10. |
If we instantaneously query for feefilters rather than poll, we run into an issue if we only have one or two peers at the time of the query.
So this is what the moving window would help us with, it would smooth out jitter caused by some of our outbound peers disconnecting. If our two remaining outbound peers advertise 1000 sats/KvB rather than the network majority of ~20,000 sats/KvB, we'll create txn's >= 1000 sats/KvB instead of ~ >= 20,000 sats/KvB since we won't have any memory of when we had 8 outbound peers.
Two of these are block-relay-only and won't send us feefilter values.
I'm wary about this because it could open the door to some malicious behavior. It might be better to just have few data points rather than some inbound peers making us pay too much or too little in fees. bitcoind does actively try to maintain 8 outbound connections so I would expect that we'd have 8 outbounds most of the time.
Yeah 13 blocks might be too much. What block value do you (or anybody else) have in mind? |
Cool I was thinking the opposite, that the network majoirty is 500 sats/kvB, and we have two peers of 1000 sats/kvB, by polling we'd have the memory of 500 sats/kvB, which gives us a feerate that won't be accepted by our peers. Under the same assumed condition, that we only have two peers, if we don't poll, it means in the worst case we still have 2 peers relaying the tx for us. If we poll, the worst case is we cannot get our tx relayed. I think we should intead focusing on maintaining X outbound peers - maybe add an outbound peers check in |
If we have 8 outbound peers all the time, then either approach behaves the same. But we should think about corner cases too. What do we do if there's 0 outbound peers? Or what if there's only 1 and its feefilter is an outlier (say 20x higher than actual min relay fee)?
3 blocks? |
I don't think it's the same - it's more design hence more complicated. Plus when 8 peers all increase their I also don't think the number of outbound connections is something we can or should control. We can only inform users, and suggest them to take care their bitcoind. |
The other option would be to use an EMA instead of an SMA here, this will cause us to respond more aggressively to recent history which will increase jitter but still respond to a dynamic environment. If the primary jitter we are trying to smooth is coming from peer churn, then I think an EMA would be sufficient as it would still have some smoothing properties, but any MA will lag the current "correct" reading (by definition). So in an increasing fee environment no matter what [X]MA we go with we will have to cope with the fact that we'll be behind the trend. |
We could skip a round if we don't have enough data points, but not sure of all the implications |
If we skip the round we should interpolate the data, solving for finding the reading we would need to keep the output value constant. This is so that we don't distort the measurement for future readings. |
Good point. It's important not to undershoot the relay fee for commitment transactions. A moving window will undershoot during rising fees, while a on-demand polling strategy may undershoot if peers have disconnected right before we do the poll. Instead of a moving average, how about a max-of-medians in the lookback window? That would mean that the latest data point is used in a rising fee environment, while older data is used for smoothing when fees are falling or oscillating. Another consideration is that we may want to set the commitment fee rate at some % above the min relay fee, to provide some protection if relay fees rise soon after we craft the commitment transaction. |
IMO it's at most 8 points of data and I don't think we should over design here. I feel the only thing we should fix is to make sure the majority of the outbound peers can relay our txns, and let users handle maintaining X amount of peers themselves since it's out of our control? We also shouldn't care about how our peers relay the txns, hence the network condition here - that's the job for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to get the tx accepted by our peers,
- normal condition, no fee spikes, no malicious peers: no difference between polling data and realtime data, tho the actual feerate may differ, the end effect stays the same.
- fee spikes, no malicious peers: polling data is likely to fail because history doesn't reflect the present.
- too few peers: say there're only 2 peers, polling data is likely to miss, resulting in the tx being rejected by the 2 peers.
- malicious peers: nothing we can do, nor we can distinguish, it's the bottom layer's responsibility to get it right, which iirc the outbound peers are chosen at random. Since we are using median here, it's unlikely the attacker can control up to 5 outbound peers.
- too few peers and malicious: polling helps, until that window is filled up with the malicious data, which means roughly 1 hour later it will behave the same as realtime data. Plus there's nothing we can do atm but inform users to manage their connections.
My main blocker is, the history of the 8 peers doesn't reflect the present preferences, not to mention we may be facing a group of different 8 peers. This doesn't mean the above issues cannot be solved - we can extend the crawler made by @Crypt-iQ and even manage the peer connections in btcwallet
based on our understanding of the feefilter distribution, freeing ourselves from using outbound peers only by applying randomized peer selection, while keeping the current PR simple.
It helps because if we use on-demand querying and have one data point, we shouldn't use just that one data point. Instead we can use the lookback window (which would store polling data for the last 3 blocks).
I'm thinking the opposite, if we only have one data point then we only have one peer, then that's the only peer we can rely on, assuming there are no inbound peers, which I think is the case by default. In addition, this is an error state, and we should implement #8487 to mitigate it.
// filterManager uses our peer's feefilter values to determine a | ||
// suitable feerate to use that will allow successful transaction | ||
// propagation. | ||
filterManager *filterManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just extend minFeeManager
to implement all the new methods?
} | ||
|
||
func fetchBitcoindFilters(client *rpcclient.Client) ([]SatPerKWeight, error) { | ||
resp, err := client.RawRequest("getpeerinfo", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be replaced with GetPeerInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use this because feefilter in btcjson.GetPeerInfoResult
has it as a uint64
and not a float64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document that here so that someone coming in later doesn't try to "fix" it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm ok does this mean btcd
has been interpreted this field wrong all the time when received it from its peers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checked - I think btcd
is behaving as expected, and I also think bitcoind
is sending this msg using int but somehow printing it using float. As according to the specs, this should be an int64
field.
Tested it by connecting to this weird node, "[2a02:c206:2016:2394::1]:8333"
, that has a feefilter set to 0.09170997
...
- From bitcoind:
"minfeefilter": 0.09170997
- From btcd:
"feefilter": 9170997
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the BIP, bitcoind converts it to a float when using getpeerinfo
but uses an int on the p2p layer
|
||
// checkFilterBounds returns false if the filter is unusable and true if it is. | ||
func checkFilterBounds(filter SatPerKVByte) bool { | ||
// Ignore values of 0 and MaxSatoshi. A value of 0 likely means that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have some logs here
@Roasbeef: review reminder |
I've addressed the main contention of the moving median by removing that logic and just tracking a median without any history. I'll address the other comments later. There are two TODOs left in the code:
In the meantime, I'll run the code with both bitcoind & btcd backends. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK 🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits and organizational recommendations.
return median, nil | ||
} | ||
|
||
type bitcoindPeerInfoResp struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely.
} | ||
|
||
var outboundPeerFilters []SatPerKWeight | ||
for _, peerResp := range resp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to cite this for loop as an example of one of the many classes of for loops we can permanently eliminate if we merge #8483.
There's nothing wrong with this code, just could have condensed 186-205 into a few filter and map operations.
80f126d
to
ef14577
Compare
Latest change is that the median feefilter is now capped when used in There are three scenarios to consider here:
|
ef14577
to
e01507a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good to ship this as is. Ideally we fix the last few nits in the suggestions but there's nothing pressing that's wrong here so 🫡
EDIT: The linter is failing, though fyi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's almost good to go, except we need some unit tests.
lnwallet/chainfee/filtermanager.go
Outdated
return 0, errNoData | ||
} | ||
|
||
return f.median.UnsafeFromSome(), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: should UnsafeFromSome
only be used in testing @ProofOfKeags ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a situation where you should do UnwrapOrErr
and combine it with the previous if block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
func fetchBitcoindFilters(client *rpcclient.Client) ([]SatPerKWeight, error) { | ||
resp, err := client.RawRequest("getpeerinfo", nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checked - I think btcd
is behaving as expected, and I also think bitcoind
is sending this msg using int but somehow printing it using float. As according to the specs, this should be an int64
field.
Tested it by connecting to this weird node, "[2a02:c206:2016:2394::1]:8333"
, that has a feefilter set to 0.09170997
...
- From bitcoind:
"minfeefilter": 0.09170997
- From btcd:
"feefilter": 9170997
lnwallet/chainfee/filtermanager.go
Outdated
// Log the new median. | ||
f.median = fn.Some(med(feeFilters)) | ||
log.Debugf("filterManager updated moving median to: %v", | ||
f.median.UnsafeFromSome().FeePerKVByte()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if you could just allocate a stack variable here for the answer rather than using UnsafeFromSome
that would be preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
e01507a
to
50771bf
Compare
50771bf
to
ad3e1bf
Compare
Added tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Send it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM🎉
Needs a rebase. |
ad3e1bf
to
7dff026
Compare
This commit introduces a filterManager that uses our outbound peers' feefilter values to determine an acceptable minimum feerate that ensures successful transaction propagation. Under the hood, a moving median is used as it is more resistant to shocks than a moving average.
7dff026
to
fccf6ce
Compare
This commit introduces a filterManager that uses our outbound peers' feefilter values to determine an acceptable minimum feerate that ensures successful transaction propagation. Under the hood, a moving median is used as it is more resistant to shocks than a moving average.
Mostly looking for an approach ACK on: