Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TxQ developer docs, and faster ledger size growth #2682

Closed
wants to merge 2 commits into from

Conversation

ximinez
Copy link
Collaborator

@ximinez ximinez commented Sep 12, 2018

Two separate commits, I'm only putting them together because they both affect the TxQ.

  1. Updates the developer docs for the TxQ and related classes. I think I got every class, function, and member.
  2. Allows the open ledger limit to grow by 20% instead of one transaction per ledger, so that we can respond more quickly to "good" network traffic growth. Further, will cause the limit to be cut by 50% instead of cutting all the way down to 50 txs when we have "bad" network traffic. Ongoing attacks will quickly drop the limit back down to a minimum, while still costing the attacker a lot in fees, but will allow the limit to recover more quickly for short attacks or non-attack "hiccup" scenarios.

Note: I don't believe this change requires an amendment because it does not affect transaction processing, only queuing behavior.

@ximinez ximinez requested review from mellery451 and scottschurr and removed request for mellery451 September 12, 2018 20:47
@ripplelabs-jenkins
Copy link
Collaborator

ripplelabs-jenkins commented Sep 12, 2018

Jenkins Build Summary

Built from this commit

Built at 20181002 - 00:32:16

Test Results

Build Type Log Result Status
rpm logfile 1036 cases, 0 failed, t: n/a PASS ✅
msvc.Debug logfile 1033 cases, 0 failed, t: 543s PASS ✅
msvc.Debug,
NINJA_BUILD=true
logfile 1033 cases, 0 failed, t: 565s PASS ✅
msvc.Debug
-Dunity=OFF
logfile 1033 cases, 0 failed, t: 1182s PASS ✅
msvc.Release logfile 1033 cases, 0 failed, t: 394s PASS ✅
gcc.Release
-Dassert=ON,
MANUAL_TESTS=true
logfile 1101 cases, 0 failed, t: 925s PASS ✅
docs logfile 1035 cases, 0 failed, t: 423s PASS ✅
gcc.Debug
-Dcoverage=ON
logfile 1036 cases, 0 failed, t: 856s PASS ✅
clang.Debug logfile 1036 cases, 0 failed, t: 278s PASS ✅
gcc.Debug logfile 1036 cases, 0 failed, t: 291s PASS ✅
clang.Debug
-Dunity=OFF
logfile 1036 cases, 0 failed, t: 635s PASS ✅
gcc.Debug
-Dunity=OFF
logfile 1036 cases, 0 failed, t: 388s PASS ✅
clang.Release
-Dassert=ON
logfile 1035 cases, 0 failed, t: 403s PASS ✅
gcc.Release
-Dassert=ON
logfile 1035 cases, 0 failed, t: 425s PASS ✅
gcc.Debug
-Dstatic=OFF
logfile 1036 cases, 0 failed, t: 295s PASS ✅
gcc.Debug
-Dstatic=OFF -DBUILD_SHARED_LIBS=ON
logfile 1036 cases, 0 failed, t: 293s PASS ✅
gcc.Debug,
NINJA_BUILD=true
logfile 1036 cases, 0 failed, t: 267s PASS ✅
clang.Debug
-Dunity=OFF -Dsan=address,
PARALLEL_TESTS=false,
DEBUGGER=false
logfile 1036 cases, 0 failed, t: 731s PASS ✅
clang.Debug
-Dunity=OFF -Dsan=undefined,
PARALLEL_TESTS=false
logfile 1036 cases, 0 failed, t: 888s PASS ✅

the validated ledger until it gets to [50](#other-constants), at
which point, the limit will be the largest number of transactions
be the max of the number of transactions in the validated ledger
plus [25%](#other-constants) or the current limit until it gets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this 25% number here the same as normal_consensus_increase_percent? If so, elsewhere the default says 20%.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops. Dang, that messed up the example, too. Fixed.

new largest value by 10% each time the ledger has more than 50
transactions.
more than [50](#other-constants) transactions. Any time the limit
decreases (ie. a large ledger is no longer recent), the limit will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie. --> i.e.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

drops, but the 22nd transaction will require
a level of about 355,000 or about 13,800 drops.
* This will cause the first 26 transactions only require 10
drops, but the 27nd transaction will require
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

27nd --> 27th ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -262,6 +276,7 @@ Result format:
"current_queue_size" : "2", // number of transactions waiting in the queue
"expected_ledger_size" : "15", // one less than the number of transactions that can get into the open ledger for the base fee.
"max_queue_size" : "300", // number of transactions allowed into the queue
"ledger_current_index" : 123456789, // sequence number of the current open ledger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

/* So we don't deal with infinite fee levels, treat
any transaction with a 0 base fee (ie. SetRegularKey
/** So we don't deal with "infinite" fee levels, treat
any transaction with a 0 base fee (ie SetRegularKey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ie --> i.e.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -267,18 +425,36 @@ class TxQ
};
}

/** Use the number of transactions in the current open ledger
to compute the fee level a transaction must pay bypass the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...pay TO bypass... ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

src/ripple/app/misc/impl/TxQ.cpp Show resolved Hide resolved
Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only part way through the review, but here are a few thought I had while looking through the first commit.

src/ripple/app/misc/TxQ.h Show resolved Hide resolved
/// Reference transaction fee level
std::uint64_t referenceFeeLevel;
/// Minimum fee level to get in the queue
std::uint64_t minFeeLevel;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting the difference between minFeeLevel and expFeeLevel. minFeeLevel is to get into the TxQ. expFeeLevel is to get into the current open ledger. Their names don't reflect that they operate on fundamentally different stuff (the TxQ vs the open ledger). I think that could be an added point of confusion in an already confusing area.

I don't mind long names, so I'll suggest txQMinFeeLevel and estOpenLedgerFeeLevel. I know that others are allergic to long names, so I don't necessarily expect you to use those.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking behind those names, IIRC, is that minFeeLevel is the minimum level to even be considered, while expFeeLevel is the expected level to be processed immediately.

What do you think about minProcessingFeeLevel and openLedgerFeeLevel? I think I'll also update the documentation again:

  • minProcessingFeeLevel : minimum fee level for a transaction to be considered for the open ledger or the queue
  • openLedgerFeeLevel: minimum fee level to get into the current open ledger, bypassing the queue

boost::optional<TxConsequences const> consequences;
};

/**
Structure returned by @ref TxQ::getTxs to describe
all transactions in the queue for the open ledger.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description doesn't look right. The description talks about "all transactions". But the data structure contains a single shared_ptr to a single transaction. Does getTxs() return one of these for each transaction in the queue? If so, then a better comment might be...

"Structure that describes one transaction that should be considered for the open ledger."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this description again, the phrasing is super-awkward. getTxs is what returns the descriptions for all transactions... I'm going to rewrite this one.

/** The *intermediate* result returned by @ref preflight before
this transaction was queued. This will always be
`tesSUCCESS`.
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. Interesting. If preflightResult is always tesSUCCESS, then what's the point in storing the value? Or even asking about it?

Copy link
Collaborator Author

@ximinez ximinez Sep 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, when I initially created the field, I was concerned there might be cases where it had some other value. Once I was convinced that there weren't it didn't occur to me to remove it. I'll double check, and do that now.

This description is wrong. There is an edge case where the result may not be tesSUCCESS - specifically if the rules or flags change after a transaction is queued, then TxQ::MaybeTx::apply can preflight again. It's unlikely, but I believe it's possible for that process to return a code that is not tesSUCCESS, but does not cause the transaction to be immediately removed from the queue, particularly if the attempt was made by TxQ::tryClearAccountQueue (speaking of complexity...).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking. You'll be adding a comment about the corner case I assume...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

src/ripple/app/misc/TxQ.h Show resolved Hide resolved
Computes the total fee level for all transactions in a series.
Assumes that there are already more than @ref txnsExpected_ txns
in the view. If there aren't, the math still works, but the level
will be higher than actually required.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the "the math still works" mean that the math doesn't crash? The phrase, "the level will be higher than actually required" suggests that the math doesn't really work...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The math still works" means you're not going to get overflows, underflows, imaginary numbers, sqrt(pi), etc. I've rephrased the comment.

@param seriesSize Total number of transactions in the series to be
processed.

@return A `std::pair` as returned from `mulDiv` indicating whether
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the conditions when the returned bool might be false? Perhaps this would be a good use for a boost::optional>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I reference mulDiv. https://github.com/ximinez/rippled/blob/txq_doc/src/ripple/basics/mulDiv.h#L29

The bool can be false if the calculation overflows. The reason it returns a pair instead of an optional is that it populates a maxint value for the result, which the caller can use anyways if it wants.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. Fair. If I'm desperate enough I can go look up the documentation on mulDiv at the cost of some small additional annoyance 😉 .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did update the description slightly, to mention that the flag indicates overflow.

@@ -287,34 +440,65 @@ class TxQ
std::size_t extraCount, std::size_t seriesSize);
};

/**
Represents transactions in the queue which may be applied
later to the open ledger.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there transactions in the queue that may never be applied to an open ledger? If so, why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several ways a transaction could fail to apply. Item 4 at https://github.com/ximinez/rippled/blob/txq_doc/src/ripple/app/misc/FeeEscalation.md#transaction-queue explains the possible results for a transaction in the queue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I think what you're saying is that all transactions in the queue are intended to be applied to an open ledger (where the transaction may fail). Any given transaction in the queue may expire before that attempt is made. But there are no transactions in the queue which we know we will never apply to an open ledger.

And, no, I'm not going to ask you to put that in the comments 😉

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. In fact, the biggest qualification for whether a transaction can be put into the queue is whether it is "likely to successfully claim a fee". And that is in the docs somewhere. 😀

TxSeq const sequence;
/// Flags provided to `apply`. If the transaction is later
/// attempted with different flags, it will need to be
/// `preflight`ed again.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would cause the flags to change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Sorry, I wasn't thinking about the flags in a broad enough sense. The comment is good as is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem.

on the current metrics. If uninitialized, there is not limit,
but that condition can not last for long in practice.
@note This member must always and only be accessed under
locked mutex_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two suggestions:

  • "... there is no limit ..."
  • "... condition cannot last ..."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. I mostly had nits and questions.

I think my biggest concern is that I'd like you to convince me why this doesn't need to be on an amendment. I agree that this change doesn't directly affect any individual transaction. But it influences which transactions are considered for the open ledger. So at the point where this change is partially installed on the network different servers will build different ledgers. I think this could cause desyncing. And the validators might have more trouble coming to consensus during the transaction period. Putting this change on amendment would stabilize the transition.

I'm open to being convinced otherwise.

cfg/rippled-example.cfg Show resolved Hide resolved
* This will cause the first 21 transactions only require 10
drops, but the 22nd transaction will require
a level of about 355,000 or about 13,800 drops.
* This will cause the first 26 transactions only require 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... transactions to only ..."?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten

@@ -100,7 +107,7 @@ ledger from highest [fee level](#fee-level) to lowest. These transactions
count against the open ledger limit, so the required [fee level](#fee-level)
may start rising during this process.
3. Once the queue is empty, or required the [fee level](#fee-level)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... or required the ..." -> "... or the required ..."?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

src/ripple/app/misc/FeeEscalation.md Show resolved Hide resolved
src/ripple/app/misc/FeeEscalation.md Show resolved Hide resolved
value of 20% was chosen to allow the limit to grow quickly as load
increases, but not so quickly as to allow bad actors to run unrestricted.
The reduction value of 50% was chosen to cause the limit to drop
significantly, but not so drastically that the limit can not quickly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Micro nit: you might look around these files for instances of "can not" and consider using "cannot" instead. I believe that "cannot" is, under most circumstances, considered the more desirable spelling.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would swear that I remember from my high school grammar rules that "cannot" is not only less preferred, but not legit at all. But I looked around a little bit, and yeah, I'm just wrong.

src/ripple/app/misc/TxQ.h Show resolved Hide resolved
src/test/app/TxQ_test.cpp Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Sep 19, 2018

Codecov Report

Merging #2682 into develop will decrease coverage by <.01%.
The diff coverage is 92.3%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2682      +/-   ##
===========================================
- Coverage    70.02%   70.01%   -0.01%     
===========================================
  Files          703      704       +1     
  Lines        55251    55272      +21     
===========================================
+ Hits         38687    38698      +11     
- Misses       16564    16574      +10
Impacted Files Coverage Δ
src/ripple/app/misc/impl/TxQ.cpp 95.37% <100%> (+0.07%) ⬆️
src/ripple/app/misc/TxQ.h 96.87% <100%> (-0.1%) ⬇️
src/ripple/rpc/impl/TransactionSign.cpp 90.07% <100%> (ø) ⬆️
src/ripple/app/misc/NetworkOPs.cpp 63.33% <72.72%> (-0.05%) ⬇️
src/ripple/app/misc/SHAMapStoreImp.cpp 72.53% <0%> (-0.24%) ⬇️
src/ripple/beast/insight/impl/StatsDCollector.cpp 0.39% <0%> (-0.01%) ⬇️
src/ripple/rpc/json_body.h 0% <0%> (ø) ⬆️
src/ripple/overlay/impl/ConnectAttempt.cpp 0% <0%> (ø) ⬆️
...le/beast/container/detail/aged_ordered_container.h 97.27% <0%> (ø) ⬆️
src/ripple/overlay/impl/PeerImp.cpp 0% <0%> (ø) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0092ae...0350acf. Read the comment docs.

@ximinez
Copy link
Collaborator Author

ximinez commented Sep 20, 2018

I think my biggest concern is that I'd like you to convince me why this doesn't need to be on an amendment.

The biggest reason I can give is precedent, including another PR that used a different "grow quickly, shrink slowly" method (#2235), one from just a few months ago that affected when transactions could come out of the ledger (#2567), the one that allowed clearing those "series" we discussed earlier (#1800), and the one that added support for multiple transactions per account (#1639).

But precedent is not conclusive.

These are my other reasons why I think this doesn't need an amendment:

  1. conflicts like this will get dropped in the first round of consensus along with any other timing-related conflicts.
  2. the network hasn't been busy enough for this to be an issue - yet. I want to be ready for if/when it is.
  3. nodes aren't likely to lose sync because they're not going to have to query the network for validated transactions they don't have - those transactions will be available locally. My understanding of desyncs is that they're more likely to happen when the validated ledger contains a lot of data that the node hasn't seen before. If I'm wrong about that, I guess that weakens this argument. 😄

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good as far as I can see. The expanded comments should help future maintainers a lot. It's a deep beastie, so I would not feel comfortable declaring there are no bugs. But there were not any that I spotted.

src/ripple/app/misc/impl/TxQ.cpp Show resolved Hide resolved
src/ripple/app/misc/TxQ.h Show resolved Hide resolved
src/ripple/app/misc/TxQ.h Show resolved Hide resolved
@ximinez
Copy link
Collaborator Author

ximinez commented Sep 25, 2018

As discussed with @scottschurr this morning, the biggest reason that this change doesn't need an amendment is that it doesn't affect the way transactions are processed through consensus. Even if consensus builds a ledger that includes transactions that a given node doesn't think paid enough, it will still include those transactions in it's validated ledger before continuing on. And since the transactions are just as likely to be available locally, the node is no more or less likely to desynchronize from the network.

I'm going to go ahead and mark this PR as passed. If there are any objections, I'll undo it.

@ximinez ximinez added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Sep 25, 2018
@nbougalis
Copy link
Contributor

Please squash this down.

@ximinez ximinez force-pushed the txq_doc branch 2 times, most recently from ce450f2 to 0d15a04 Compare September 28, 2018 15:49
@ximinez
Copy link
Collaborator Author

ximinez commented Sep 28, 2018

Squashed and rebased onto 1.1.0.

@ximinez ximinez removed the Rebase label Sep 28, 2018
@nbougalis nbougalis mentioned this pull request Oct 1, 2018
* Rename a couple of member variables for clarity.
* When increasing the expected ledger size, add on an extra 20%.
* When decreasing the expected ledger size, take the minimum of the
  validated ledger size or the old expected size, and subract another 50%.
* Update fee escalation documentation.
* Refactor the FeeMetrics object to use values from Setup
* Resolves RIPD-1630.
@nbougalis
Copy link
Contributor

Merged as e14f913 and 7295cf9.

@nbougalis nbougalis closed this Oct 1, 2018
@ximinez ximinez deleted the txq_doc branch October 1, 2018 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants