Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address elevated transaction fees. #4022

Closed
wants to merge 5 commits into from

Conversation

ximinez
Copy link
Collaborator

@ximinez ximinez commented Dec 9, 2021

High Level Overview of Change

Two main changes:

  • Revert a subset of the changes from ximinez@62127d7 which would defer transactions from one ledger to the next into the transaction queue (TxQ). A transaction that is in the open ledger but doesn't get validated should stay in the open ledger so that it can be proposed in the next transaction set.
  • Order the transaction queue deterministically, first be fee level descending, then by transaction ID / hash ascending. This will improve the overlap of initial proposals from different ledgers because they will be more likely to be pulling the same set of transactions out of their queues into their open ledgers.

Also added and changed some logging and made some minor optimizations.

Context of Change

Between about 14:00 and 17:00 UTC on December 1st, transaction submission rates went up 10-fold. This caused some unexpected side effects. The most observable effect was that transaction queues were full on most nodes in the network, but that was just a symptom. Most of those transactions had a 12 drop fee, but very few of them were getting validated. Transactions paying more would typically land in the open ledger or towards the front of the queues, and were thus being validated. This caused much consternation. The root cause of the problem was that transactions in the proposals for one ledger were being deferred to the next ledger, but the deferral process dropped them entirely because they could not be put back into the now-full queue. tl;dr they should never have attempted to be put back into the queue, but this flaw was not apparent until the current conditions presented themselves.

The problem was further exacerbated by the TxQ policy of ordering transactions with the same fee level by arrival time - basically first come, first served. Because the rate at which transactions are being submitted is so high, most nodes saw most transactions in a drastically different order. Validators would proposed some of these transactions, but very few nodes proposed the same set of transactions. Using a definitive ordering will help increase the overlap between those initial proposals between validators, and the deferral fix will ensure that any differences will be resolved in the next ledger.

Once a majority of UNL validators have this fix, the number of transactions validated should increase dramatically until either they catch up to the backlog, or the new load level becomes the "new normal". Once either of those things happens, the fee escalation logic should have a more accurate view of what the network is capable of and lower baseline fees accordingly.

Note to operators: until a majority of UNL validators are updated, node operators running this change can expect to see 500-1500 transactions in the open ledger, and the transaction queue near or at capacity. This will show a correspondingly large open ledger fee.

Type of Change

  • [X ] Bug fix (non-breaking change which fixes an issue)
  • [X ] Refactor (non-breaking change that only restructures code)

* Log load fee values (at debug) received from validations.
* Log remote and cluster fee values (at trace) when changed.
* Refactor JobQueue::isOverloaded to return sooner if overloaded.
* Refactor Transactor::checkFee to only compute fee if ledger is open.
Copy link
Collaborator

@thejohnfreeman thejohnfreeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that this fix works by removing so much code!

src/ripple/core/impl/JobQueue.cpp Outdated Show resolved Hide resolved
@mDuo13
Copy link
Collaborator

mDuo13 commented Dec 10, 2021

A question on the intended design of deterministic queuing: if the queue is full, can a new transaction kick a previous transaction with the same fee, but lower hash, out of the queue? If so, "hash mining" is a theoretical way to game the system, but likely irrelevant since paying even 1 drop more also prioritizes the transaction, and for the foreseeable future 1 drop will often be less valuable than the effort to mine a lower hash.

If a same-fee-lower-hash transaction can't evict a previous transaction, then isn't it still possible for validators to end up with very different—potentially even non-overlapping—queues? Suppose that, for each validator, there is a spammer sending numerous minimum-fee transactions from a node that is near the validator in the network topology. Each spammer could claim all the minimum-fee slots in their nearest validator's queue, resulting in no overlap possible between validators' choices of minimum-fee transactions.

It would be preferable to actually validate as many of each spammer's transactions as possible, to burn the fees, and I'd argue this is more important than preventing someone mining hashes to save 0.000001 XRP, so I think the answer should be "Yes, a same-fee-lower-hash transaction evicts another transaction from the queue if it is full."

bool
operator()(const MaybeTx& lhs, const MaybeTx& rhs) const
{
if (lhs.feeLevel == rhs.feeLevel)
return lhs.txID < rhs.txID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LIke @mDuo13 said in the comments, this is mineable, but if the variation of fees is random enough, shouldn't matter. If you wanted to be fancy, you could use the ledger sequence, or even better, the previous ledger hash, as a salt for a non-cryptographic hash of the transaction id, and order by that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please consider using a similar method to transaction processing ordering also for queue sorting. It might not be an issue now, but it seems like it could be used to gain an unfair advantage. Even worse - the unfair advantage might be used to cause increasing load on the network, since I don't see much downside to push every incrementally "better" transaction immediately out instead of waiting for a "winner" and just let the remaining ones become invalid (e.g. through using the same sequence number?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding mining for a low txID, I agree with @MarkusTeufelberger and @donovanhide that this is an issue. But I think it's not a top priority, so I don't personally believe it needs to be addressed in this pull particular request (which I think we'd hope to see on the network fairly soon).

My recollection of boost::intrusive::multiset (the container type for byFee_) is foggy. But my best guess is that an efficient way to randomize access in the container, while still staying within a fixed fee range, would involve reorganizing the container. That may take a bit of time. I encourage you to take that time (soon) so these legitimate concerns will be addressed. But, again, I don't think that change needs to be in this particular pull request.

Copy link
Contributor

@nbougalis nbougalis Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we were discussing this fix, @thejohnfreeman suggested using the parent ledger's hash, just as you did @donovanhide. I think it makes sense and it's solution I would prefer too.

But I'm of two minds here: one the one hand, I agree that adding this sort of "anti-mining" support makes sense and it's fairly easy; on the other hand, given that we're all moving fast to try and get this fix done on an expedited basis, I'd rather minimize the number of changes.

So I'm fine with leaving this as a "TODO" but if others feels it's important to have it added now and @ximinez feels he can add it with minimal risk, I'm supportive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, @mDuo13 , @MarkusTeufelberger , and @donovanhide , thank you very much for the feedback! I've already got anti-mining on the radar, and as @scottschurr mentioned, we've got some ideas for it.

However, to emphasize what @nbougalis said, the priority for this PR is to keep the changes as small, simple, and easy to reason about as possible.

I also think that the risk right now even just using the unmodified TX id is small for a few reasons.

  • If you want to jump ahead of someone specifically for some reason, it's probably a lot cheaper to pay one drop more than it is to pre-mine a TX with a predetermined fee.
  • Once your transaction is out of the queue, now your transaction has to contend with the sorted transaction set, which does use that parent ledger hash.
  • The only meaningful attack that I can figure out is to get your transaction into the open ledger / proposed set and prevent another transaction from getting in. That's going to be really hard to time perfectly over and above mining the hash. Partly because you won't know for sure how many other txs are in the ledger or the queue. And you have to do it on a majority of validators.

So yes, there's a risk, and I'd like to address it in another minor release soon, but it seems like it's a very small risk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this issue should not be addressed in this PR. However, just wanted to highlight the nuances of one particular fictional example motive for mining:
I'm a market maker A who has competitor market maker B. I know he or she doesn't appear to monitor their bot very closely. They do not use a dynamic fee, just 12 drops. I want to stop B's offers getting into the ledger so people consume my offers instead. I pay one more drop for my offers and stuff the queue with 12 drop pointless AccountSet transactions with mined Transaction Ids that are likely to be before most, if not all of B's offers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@donovanhide Yeah, I can see how that might be a viable attack. It's still an open question whether it would still be cheaper to just have those noop transactions pay 13 drops vs. pre-mining.

@MarkusTeufelberger
Copy link
Collaborator

I think I recall somewhere that "fee level" is not the same as "fee amount"? If so, it might be worth considering sorting by the actual amount, so AccountDelete transactions (which are mostly beneficial in the longer run for validators - fewer objects in the ledger is better!) get heavily prioritized. This would also solve #4016

@mDuo13
Copy link
Collaborator

mDuo13 commented Dec 10, 2021

I think I recall somewhere that "fee level" is not the same as "fee amount"? If so, it might be worth considering sorting by the actual amount, so AccountDelete transactions (which are mostly beneficial in the longer run for validators - fewer objects in the ledger is better!) get heavily prioritized. This would also solve #4016

Correct, a "fee level" is, essentially, "what percentage of the minimum for this particular transaction are you paying?" They were invented for the queue; otherwise, transactions that involve extra work—like escrows with fulfillments, or multi-signed transactions—wouldn't need to pay elevated costs to kick out other, simpler transactions that were paying their respective minimums.

It's only with AccountDelete that it gets weird because the "1 owner reserve" amount is so out of scale with the other transaction types' costs.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as far as I can see. I left a couple of notes about where I was surprised, so a couple more comments might be called for. But I leave those to your discretion.

I am hoping to see another pull request from you (soon) to address concerns raised by @mDuo13, @donovanhide, and @MarkusTeufelberger. But I don't think those concerns need to be handled in this particular pull request.

bool
operator()(const MaybeTx& lhs, const MaybeTx& rhs) const
{
if (lhs.feeLevel == rhs.feeLevel)
return lhs.txID < rhs.txID;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially surprised to see the comparison operator direction swapped between txID and feeLevel. Feels like it might be worth a comment that we sort with the smallest txID at the front, and the largest feeLevel at the front.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left such a comment at the top of the class. I can move it to the function, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

JLOG(j_.warn())
++lastRIter;
}
if (lastRIter == byFee_.rend())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like, for this condition to be met, the entire contents of byFee_ would have to contain transactions only from this account. And byFee_ is at capacity at this point. But the TxQ limits the number of transactions from an individual account to maximumTxnPerAccount, which is presumably smaller than the total byFee_ size. That said, I'm glad you're checking lastRIter == byFee_.rend().

Maybe it's worth a comment that says we never expect this to happen?

Copy link
Collaborator Author

@ximinez ximinez Dec 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It does happen in a unit test where the limits are much smaller, and I suppose it would be possible if someone set their config to some unusual values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

bool
operator()(const MaybeTx& lhs, const MaybeTx& rhs) const
{
if (lhs.feeLevel == rhs.feeLevel)
return lhs.txID < rhs.txID;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding mining for a low txID, I agree with @MarkusTeufelberger and @donovanhide that this is an issue. But I think it's not a top priority, so I don't personally believe it needs to be addressed in this pull particular request (which I think we'd hope to see on the network fairly soon).

My recollection of boost::intrusive::multiset (the container type for byFee_) is foggy. But my best guess is that an efficient way to randomize access in the container, while still staying within a fixed fee range, would involve reorganizing the container. That may take a bit of time. I encourage you to take that time (soon) so these legitimate concerns will be addressed. But, again, I don't think that change needs to be in this particular pull request.

src/ripple/core/impl/JobQueue.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@nbougalis nbougalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this as-is. @donovanhide and @thejohnfreeman have both discussed mitigations to gaming the ordering and I think that's great to add but, for me, it's not a requirement with this change.

The good thing about this change is that it is not transaction-breaking and, so, it doesn't require an amendment. Good job.

src/ripple/core/impl/JobQueue.cpp Outdated Show resolved Hide resolved
bool
operator()(const MaybeTx& lhs, const MaybeTx& rhs) const
{
if (lhs.feeLevel == rhs.feeLevel)
return lhs.txID < rhs.txID;
Copy link
Contributor

@nbougalis nbougalis Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we were discussing this fix, @thejohnfreeman suggested using the parent ledger's hash, just as you did @donovanhide. I think it makes sense and it's solution I would prefer too.

But I'm of two minds here: one the one hand, I agree that adding this sort of "anti-mining" support makes sense and it's fairly easy; on the other hand, given that we're all moving fast to try and get this fix done on an expedited basis, I'd rather minimize the number of changes.

So I'm fine with leaving this as a "TODO" but if others feels it's important to have it added now and @ximinez feels he can add it with minimal risk, I'm supportive.

@ximinez
Copy link
Collaborator Author

ximinez commented Dec 10, 2021

I think I recall somewhere that "fee level" is not the same as "fee amount"? If so, it might be worth considering sorting by the actual amount, so AccountDelete transactions (which are mostly beneficial in the longer run for validators - fewer objects in the ledger is better!) get heavily prioritized. This would also solve #4016

I think any solution to this issue is going to be more complicated that I think would be safe for this small,.simple, easy to reason about fix. Also, I would like to see what the consequences / fall out of this fix are before we jump too far into this issue. It's reasonably possible that once the dust settles, things will clear up enough that Account Delete transactions will succeed paying only their base fee.

@ximinez
Copy link
Collaborator Author

ximinez commented Dec 11, 2021

I just pushed two commits:

  1. Change the some of the default queue sizes. These are a tad more conservative than what I think some of the validators are currently using.
  2. Address @thejohnfreeman and @scottschurr 's suggestions.

@ximinez ximinez force-pushed the deterministic-txq branch 2 times, most recently from c0d1859 to 5923a8d Compare December 11, 2021 20:15
Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks great. Thanks!

* Sort by fee level (which is the current behavior) then by transaction
  ID (hash).
* Edge case when the account at the end of the queue submits a higher
  paying transaction to walk backwards and compare against the cheapest
  transaction from a different account.
* Use std::if_any to simplify the JobQueue::isOverloaded loop.
@ximinez
Copy link
Collaborator Author

ximinez commented Dec 14, 2021

Squashed the fold commit. No other changes.

This was referenced Dec 15, 2021
@ximinez ximinez deleted the deterministic-txq branch December 17, 2021 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants