Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed 0.70.2 hotfix #2231

Merged
merged 2 commits into from
Sep 21, 2017
Merged

Proposed 0.70.2 hotfix #2231

merged 2 commits into from
Sep 21, 2017

Conversation

ximinez
Copy link
Collaborator

@ximinez ximinez commented Sep 19, 2017

Recover old open ledger transactions to the queue:

  • Recover to the open ledger once, then to the queue.
  • If transaction fails to queue for any reason, drop it.
  • New result codes for transactions that can not queue.
  • Add minimum queue size
  • RIPD-1530
    fix Recent fee rises and TxQ issues #2215

@ximinez ximinez force-pushed the relayopenledger.70 branch 2 times, most recently from b2c08e5 to 8c2fd48 Compare September 19, 2017 18:51
@ximinez
Copy link
Collaborator Author

ximinez commented Sep 19, 2017

Build and test problems. Closing to fix.

@ximinez ximinez closed this Sep 19, 2017
if (app.getHashRouter().shouldRecover(tx->getTransactionID()))
return ripple::apply(
app, view, *tx, flags, j);
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to nitpick, but please use {} here. Yes, it's not strictly necessary, but the comment block following the else is large and the curlies will help make the code easier to read.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, and entirely sensible.

auto const result = [&]
{
if (app.getHashRouter().shouldRecover(tx->getTransactionID()))
return ripple::apply(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

micro-nit: single line works fine here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@note The limit is signed while the counter is unsigned.
A negative limit will retry forever.
*/
bool shouldRecover(std::int32_t limit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't care about negative limits (why should we?) then this can be neatly simplified as the following, if we increase limit by 1:

bool shouldRecover(std::int32_t limit)
{
    return ++recoveries_ % limit != 0;
}

This reads better, at least to me. And, as an extra benefit, it tracks exactly how many times we've tried to recover the item.

Also, I'd also make both limit and recoveries_ be unsigned int.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to leave the option available to basically remove the limitations by setting a negative limit.

I also wanted an option to have a limit of 0, which would basically prevent retries entirely.

If we don't want either of those, your optimization makes sense (and was my initial approach, too). (I did have another concern about integer overflow, but if a tx retries 2^32 times, that's a whole other issue.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion, changed for the simplification

@@ -303,6 +308,7 @@ RCLConsensus::onClose(
// Build SHAMap containing all transactions in our open ledger
for (auto const& tx : initialLedger->txs)
{
JLOG(j_.debug()) << "Adding open ledger TX " << tx.first->getTransactionID();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Micro-nit: indentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

canBeHeld = accountIter == byAccount_.end() ||
replacementIter ||
accountIter->second.getTxnCount() <
setup_.maximumTxnPerAccount;
setup_.maximumTxnPerAccount ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This construct has now, officially, become impossible to read.

Is this better maybe?

/* Limit the number of transactions an individual account
   can queue. Mitigates the lost cost of relaying should
   an early one fail or get dropped.
 */
if (canBeHeld)
{
    if (accountIter != byAccount_.end())
        canBeHeld = false;

    if (!canBeHeld && replacementIter)
        canBeHeld = true;
    
    if (!canBeHeld && 
            accountIter->second.getTxnCount() < setup_.maximumTxnPerAccount)
        canBeHeld =  true;

    // Allow the transaction to get in front of the first queued
    // transaction. This allows recovery of open ledger transactions
    // and stuck transactions.
    if (!canBeHeld &&
            tx.getSequence() < accountIter->second.transactions.begin()->first)
        canBeHeld = true;
}

P.S.: I am convinced that the above logic is correct and equivalent to what's already there, but please double-check before using this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it differently, but I think you'll like it.

@ximinez ximinez reopened this Sep 19, 2017
@ximinez ximinez force-pushed the relayopenledger.70 branch 2 times, most recently from 1368040 to 86892cd Compare September 19, 2017 22:54
Copy link
Contributor

@nbougalis nbougalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the issue with bool shouldRecover(std::uint32_t limit)

@@ -163,6 +164,10 @@ RCLConsensus::relay(RCLCxTx const& tx)
app_.overlay().foreach (send_always(
std::make_shared<Message>(msg, protocol::mtTRANSACTION)));
}
else
{
JLOG(j_.debug()) << "Not relaying disputedtx " << tx.id();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space between disputed and tx

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Serializer s;

tx->add(s);
msg.set_rawtransaction(&s.getData().front(), s.getLength());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do:
msg.set_rawtransaction(s.data(), s.size());

*/
bool shouldRecover(std::uint32_t limit)
{
return ++recoveries_ % limit != 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So getDefaultRecoverLimit returns 1 and since x mod 1 is always 0 this will always return false. If we make getDefaultRecoverLimit return 2, then this function will oscillate between true and false. Not sure if that's what we want.

I know that I was the one that suggested this change, but I'm now thinking that this should be reduced to:

bool shouldRecover(std::uint32_t limit)
{
    return recoveries_++ < limit;
}

With limit == 1 this function will return to the sequence true, false, false, false, ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    HashRouter (Stopwatch& clock, std::chrono::seconds entryHoldTimeInSeconds,
        std::uint32_t recoverLimit)
        : suppressionMap_(clock)
        , holdTime_ (entryHoldTimeInSeconds)
        , recoverLimit_ (recoverLimit + 1u)
    {
    }

That does the right thing, works after rediscovery, and allows the API to specify "recover this many times before giving up."

setup_.maximumTxnPerAccount;

// Allow if the account is not in the queue at all
canBeHeld = accountIter == byAccount_.end();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is much more readable!

/** Determines if this item should be recovered from the open ledger.

Counts the number of times the item has been recovered.
If it hits the limit, reset the counter and return false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset the counter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased.

// already had a chance via disputes
hint->second = false;
else
shouldRecover.emplace_hint(hint, txID,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why emplace_hint if hint is end()? Is it because we know this is the highest tx added from current_ (even though there may be larger disputed tx ids in the map)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I mixed my metaphors and used the wrong function. Should be right now.

@@ -1454,6 +1482,7 @@ setup_TxQ(Config const& config)
TxQ::Setup setup;
auto const& section = config.section("transaction_queue");
set(setup.ledgersInQueue, "ledgers_in_queue", section);
set(setup.queueSizeMin, "minimum_queue_size", section);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be documented in rippled-example.cfg

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, yeah.

{ telCAN_NOT_QUEUE_BLOCKS, { "telCAN_NOT_QUEUE_BLOCKS", "Can not queue at this time: would block later queued transaction(s)." } },
{ telCAN_NOT_QUEUE_BLOCKED, { "telCAN_NOT_QUEUE_BLOCKED", "Can not queue at this time: blocking transaction in queue." } },
{ telCAN_NOT_QUEUE_FEE, { "telCAN_NOT_QUEUE_FEE", "Can not queue at this time: fee insufficient to replace queued transaction." } },
{ telCAN_NOT_QUEUE_FULL, { "telCAN_NOT_QUEUE_FULL", "Can not queue at this time: queue is full." } },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closing curly braces aren't aligned with the rest of the file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't really help readability for me, since the lines are already so insane long, but fixed.

* If the transaction can't be queued, recover to the open ledger once,
  and drop it on the next attempt.
* New result codes for transactions that can not queue.
* Add minimum queue size.
* Remove the obsolete and incorrect SF_RETRY flag.
* fix XRPLF#2215
@nbougalis nbougalis merged commit cd2d52a into XRPLF:master Sep 21, 2017
@ximinez ximinez deleted the relayopenledger.70 branch September 21, 2017 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants