-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address elevated transaction fees. #4022
Conversation
* Log load fee values (at debug) received from validations. * Log remote and cluster fee values (at trace) when changed. * Refactor JobQueue::isOverloaded to return sooner if overloaded. * Refactor Transactor::checkFee to only compute fee if ledger is open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love that this fix works by removing so much code!
A question on the intended design of deterministic queuing: if the queue is full, can a new transaction kick a previous transaction with the same fee, but lower hash, out of the queue? If so, "hash mining" is a theoretical way to game the system, but likely irrelevant since paying even 1 drop more also prioritizes the transaction, and for the foreseeable future 1 drop will often be less valuable than the effort to mine a lower hash. If a same-fee-lower-hash transaction can't evict a previous transaction, then isn't it still possible for validators to end up with very different—potentially even non-overlapping—queues? Suppose that, for each validator, there is a spammer sending numerous minimum-fee transactions from a node that is near the validator in the network topology. Each spammer could claim all the minimum-fee slots in their nearest validator's queue, resulting in no overlap possible between validators' choices of minimum-fee transactions. It would be preferable to actually validate as many of each spammer's transactions as possible, to burn the fees, and I'd argue this is more important than preventing someone mining hashes to save 0.000001 XRP, so I think the answer should be "Yes, a same-fee-lower-hash transaction evicts another transaction from the queue if it is full." |
bool | ||
operator()(const MaybeTx& lhs, const MaybeTx& rhs) const | ||
{ | ||
if (lhs.feeLevel == rhs.feeLevel) | ||
return lhs.txID < rhs.txID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LIke @mDuo13 said in the comments, this is mineable, but if the variation of fees is random enough, shouldn't matter. If you wanted to be fancy, you could use the ledger sequence, or even better, the previous ledger hash, as a salt for a non-cryptographic hash of the transaction id, and order by that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please consider using a similar method to transaction processing ordering also for queue sorting. It might not be an issue now, but it seems like it could be used to gain an unfair advantage. Even worse - the unfair advantage might be used to cause increasing load on the network, since I don't see much downside to push every incrementally "better" transaction immediately out instead of waiting for a "winner" and just let the remaining ones become invalid (e.g. through using the same sequence number?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe could reuse some of the logic and thinking from CanonicalTxSet, in a reusable class?
https://github.com/ripple/rippled/blob/9d89d4c1885e13e3c0aa7cdb4a389c6fbfd3790a/src/ripple/app/misc/CanonicalTXSet.h#L173-L174
https://github.com/ripple/rippled/blob/9d89d4c1885e13e3c0aa7cdb4a389c6fbfd3790a/src/ripple/app/misc/CanonicalTXSet.h#L128-L133
https://github.com/ripple/rippled/blob/9d89d4c1885e13e3c0aa7cdb4a389c6fbfd3790a/src/ripple/app/misc/CanonicalTXSet.cpp#L24-L49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding mining for a low txID
, I agree with @MarkusTeufelberger and @donovanhide that this is an issue. But I think it's not a top priority, so I don't personally believe it needs to be addressed in this pull particular request (which I think we'd hope to see on the network fairly soon).
My recollection of boost::intrusive::multiset
(the container type for byFee_
) is foggy. But my best guess is that an efficient way to randomize access in the container, while still staying within a fixed fee range, would involve reorganizing the container. That may take a bit of time. I encourage you to take that time (soon) so these legitimate concerns will be addressed. But, again, I don't think that change needs to be in this particular pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we were discussing this fix, @thejohnfreeman suggested using the parent ledger's hash, just as you did @donovanhide. I think it makes sense and it's solution I would prefer too.
But I'm of two minds here: one the one hand, I agree that adding this sort of "anti-mining" support makes sense and it's fairly easy; on the other hand, given that we're all moving fast to try and get this fix done on an expedited basis, I'd rather minimize the number of changes.
So I'm fine with leaving this as a "TODO" but if others feels it's important to have it added now and @ximinez feels he can add it with minimal risk, I'm supportive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, @mDuo13 , @MarkusTeufelberger , and @donovanhide , thank you very much for the feedback! I've already got anti-mining on the radar, and as @scottschurr mentioned, we've got some ideas for it.
However, to emphasize what @nbougalis said, the priority for this PR is to keep the changes as small, simple, and easy to reason about as possible.
I also think that the risk right now even just using the unmodified TX id is small for a few reasons.
- If you want to jump ahead of someone specifically for some reason, it's probably a lot cheaper to pay one drop more than it is to pre-mine a TX with a predetermined fee.
- Once your transaction is out of the queue, now your transaction has to contend with the sorted transaction set, which does use that parent ledger hash.
- The only meaningful attack that I can figure out is to get your transaction into the open ledger / proposed set and prevent another transaction from getting in. That's going to be really hard to time perfectly over and above mining the hash. Partly because you won't know for sure how many other txs are in the ledger or the queue. And you have to do it on a majority of validators.
So yes, there's a risk, and I'd like to address it in another minor release soon, but it seems like it's a very small risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this issue should not be addressed in this PR. However, just wanted to highlight the nuances of one particular fictional example motive for mining:
I'm a market maker A who has competitor market maker B. I know he or she doesn't appear to monitor their bot very closely. They do not use a dynamic fee, just 12 drops. I want to stop B's offers getting into the ledger so people consume my offers instead. I pay one more drop for my offers and stuff the queue with 12 drop pointless AccountSet transactions with mined Transaction Ids that are likely to be before most, if not all of B's offers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@donovanhide Yeah, I can see how that might be a viable attack. It's still an open question whether it would still be cheaper to just have those noop transactions pay 13 drops vs. pre-mining.
I think I recall somewhere that "fee level" is not the same as "fee amount"? If so, it might be worth considering sorting by the actual amount, so AccountDelete transactions (which are mostly beneficial in the longer run for validators - fewer objects in the ledger is better!) get heavily prioritized. This would also solve #4016 |
Correct, a "fee level" is, essentially, "what percentage of the minimum for this particular transaction are you paying?" They were invented for the queue; otherwise, transactions that involve extra work—like escrows with fulfillments, or multi-signed transactions—wouldn't need to pay elevated costs to kick out other, simpler transactions that were paying their respective minimums. It's only with AccountDelete that it gets weird because the "1 owner reserve" amount is so out of scale with the other transaction types' costs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as far as I can see. I left a couple of notes about where I was surprised, so a couple more comments might be called for. But I leave those to your discretion.
I am hoping to see another pull request from you (soon) to address concerns raised by @mDuo13, @donovanhide, and @MarkusTeufelberger. But I don't think those concerns need to be handled in this particular pull request.
bool | ||
operator()(const MaybeTx& lhs, const MaybeTx& rhs) const | ||
{ | ||
if (lhs.feeLevel == rhs.feeLevel) | ||
return lhs.txID < rhs.txID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially surprised to see the comparison operator direction swapped between txID
and feeLevel
. Feels like it might be worth a comment that we sort with the smallest txID
at the front, and the largest feeLevel
at the front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left such a comment at the top of the class. I can move it to the function, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
JLOG(j_.warn()) | ||
++lastRIter; | ||
} | ||
if (lastRIter == byFee_.rend()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like, for this condition to be met, the entire contents of byFee_
would have to contain transactions only from this account. And byFee_
is at capacity at this point. But the TxQ
limits the number of transactions from an individual account to maximumTxnPerAccount
, which is presumably smaller than the total byFee_
size. That said, I'm glad you're checking lastRIter == byFee_.rend()
.
Maybe it's worth a comment that says we never expect this to happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It does happen in a unit test where the limits are much smaller, and I suppose it would be possible if someone set their config to some unusual values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
bool | ||
operator()(const MaybeTx& lhs, const MaybeTx& rhs) const | ||
{ | ||
if (lhs.feeLevel == rhs.feeLevel) | ||
return lhs.txID < rhs.txID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding mining for a low txID
, I agree with @MarkusTeufelberger and @donovanhide that this is an issue. But I think it's not a top priority, so I don't personally believe it needs to be addressed in this pull particular request (which I think we'd hope to see on the network fairly soon).
My recollection of boost::intrusive::multiset
(the container type for byFee_
) is foggy. But my best guess is that an efficient way to randomize access in the container, while still staying within a fixed fee range, would involve reorganizing the container. That may take a bit of time. I encourage you to take that time (soon) so these legitimate concerns will be addressed. But, again, I don't think that change needs to be in this particular pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this as-is. @donovanhide and @thejohnfreeman have both discussed mitigations to gaming the ordering and I think that's great to add but, for me, it's not a requirement with this change.
The good thing about this change is that it is not transaction-breaking and, so, it doesn't require an amendment. Good job.
bool | ||
operator()(const MaybeTx& lhs, const MaybeTx& rhs) const | ||
{ | ||
if (lhs.feeLevel == rhs.feeLevel) | ||
return lhs.txID < rhs.txID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we were discussing this fix, @thejohnfreeman suggested using the parent ledger's hash, just as you did @donovanhide. I think it makes sense and it's solution I would prefer too.
But I'm of two minds here: one the one hand, I agree that adding this sort of "anti-mining" support makes sense and it's fairly easy; on the other hand, given that we're all moving fast to try and get this fix done on an expedited basis, I'd rather minimize the number of changes.
So I'm fine with leaving this as a "TODO" but if others feels it's important to have it added now and @ximinez feels he can add it with minimal risk, I'm supportive.
I think any solution to this issue is going to be more complicated that I think would be safe for this small,.simple, easy to reason about fix. Also, I would like to see what the consequences / fall out of this fix are before we jump too far into this issue. It's reasonably possible that once the dust settles, things will clear up enough that Account Delete transactions will succeed paying only their base fee. |
I just pushed two commits:
|
c0d1859
to
5923a8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks great. Thanks!
* Sort by fee level (which is the current behavior) then by transaction ID (hash). * Edge case when the account at the end of the queue submits a higher paying transaction to walk backwards and compare against the cheapest transaction from a different account. * Use std::if_any to simplify the JobQueue::isOverloaded loop.
5923a8d
to
aa20afa
Compare
Squashed the fold commit. No other changes. |
High Level Overview of Change
Two main changes:
Also added and changed some logging and made some minor optimizations.
Context of Change
Between about 14:00 and 17:00 UTC on December 1st, transaction submission rates went up 10-fold. This caused some unexpected side effects. The most observable effect was that transaction queues were full on most nodes in the network, but that was just a symptom. Most of those transactions had a 12 drop fee, but very few of them were getting validated. Transactions paying more would typically land in the open ledger or towards the front of the queues, and were thus being validated. This caused much consternation. The root cause of the problem was that transactions in the proposals for one ledger were being deferred to the next ledger, but the deferral process dropped them entirely because they could not be put back into the now-full queue. tl;dr they should never have attempted to be put back into the queue, but this flaw was not apparent until the current conditions presented themselves.
The problem was further exacerbated by the TxQ policy of ordering transactions with the same fee level by arrival time - basically first come, first served. Because the rate at which transactions are being submitted is so high, most nodes saw most transactions in a drastically different order. Validators would proposed some of these transactions, but very few nodes proposed the same set of transactions. Using a definitive ordering will help increase the overlap between those initial proposals between validators, and the deferral fix will ensure that any differences will be resolved in the next ledger.
Once a majority of UNL validators have this fix, the number of transactions validated should increase dramatically until either they catch up to the backlog, or the new load level becomes the "new normal". Once either of those things happens, the fee escalation logic should have a more accurate view of what the network is capable of and lower baseline fees accordingly.
Note to operators: until a majority of UNL validators are updated, node operators running this change can expect to see 500-1500 transactions in the open ledger, and the transaction queue near or at capacity. This will show a correspondingly large open ledger fee.
Type of Change