Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gateway_balances): handle overflow exception #4355

Merged
merged 2 commits into from
Mar 16, 2023

Conversation

RichardAH
Copy link
Collaborator

@RichardAH RichardAH commented Nov 30, 2022

High Level Overview of Change

There's a bug in the sum totalling routine for gateway_balances. It's quite possible to overflow the sum total and cause an rpc internal error to be returned (exception caught high up).

Of the possible ways to deal with this I decided to stop summing at the point of overflow, return the last good sum and signal to the end user that an overflow occurred with a new JSON field. Possibly there is a better way to do it but this seemed to be the most backwards compatible to me. 36ffe0c removed the "overflow" field. Instead, the server returns the maximum possible STAmount.

Example account:

{"command": "gateway_balances","account": "raiata7DoTJL5CMHuYC8BYGE6J4psyCMcE"}

On a stock node:

{"error":"internal","error_code":73,"error_message":"Internal error.","request":{"account":"raiata7DoTJL5CMHuYC8BYGE6J4psyCMcE","command":"gateway_balances","status":"error","type":"response"}

On a node running this PR 0d0464f:

{"result":{"account":"raiata7DoTJL5CMHuYC8BYGE6J4psyCMcE","ledger_current_index":76108352,"obligations":{"544845434954594F464749455A57410000000000":"9358546448305227e80"},"overflow":"true","validated":false},"status":"success","type":"response"}

Context of Change

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Tests (You added tests for code that already exists, or your new feature included in this PR)
  • Documentation Updates
  • Release

@RichardAH
Copy link
Collaborator Author

Longest suite times:
   42.7s ripple.tx.Offer
   35.1s ripple.app.ValidatorSite
   21.9s ripple.rpc.NodeToShardRPC
   21.4s ripple.tx.NFToken
   18.1s ripple.tx.NFTokenBurn
   14.1s ripple.app.ShardArchiveHandler
   14.0s ripple.app.Flow
   14.0s ripple.app.TheoreticalQuality
    8.2s ripple.app.LedgerReplayer
    8.0s ripple.tx.NFTokenDir
341.7s, 204 suites, 1527 cases, 532158 tests total, 0 failures

@WietseWind WietseWind requested a review from intelliot November 30, 2022 14:34
@WietseWind
Copy link
Member

WietseWind commented Nov 30, 2022

@intelliot FYI: This is an important one, some issuers reported to Xumm support that explorers, Xumm, everything suddenly doesn't report their incoming TrustLines and obligations anymore. This fix addresses that bug that now lives on all nodes.

@RichardAH
Copy link
Collaborator Author

After some thought: a better long term solution is to use an arbitrary precision (or much larger precision) floating point library (boost/multiprecision/cpp_dec_float ?) for these summing totals. But for now this is a suitable patch that can easily go into the next build

@intelliot
Copy link
Collaborator

Is there risk that an API consumer will overlook the "overflow":"true" and assume that the sum is accurate?

@mtrippled
Copy link
Collaborator

Is there risk that an API consumer will overlook the "overflow":"true" and assume that the sum is accurate?

The sum is not being returned at all.

I'm approving this because bogus data is not being returned and it fixes the problem users are having.

Copy link
Collaborator

@mtrippled mtrippled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM
The API docs will need to be updated to reflect this (@mDuo13 ?).

@RichardAH
Copy link
Collaborator Author

Is there risk that an API consumer will overlook the "overflow":"true" and assume that the sum is accurate?

Yes but that’s a gotcha with delivered amount also. Needs proper documentation to make sure implementations check for the overflow key. It’s probably better to continue returning a k-v pair for the currency in question otherwise at point of overflow the client would stop even displaying their currency at all and just say error. Which is already what the behaviour was.

@mtrippled
Copy link
Collaborator

@RichardAH
Oh--I didn't read the results clearly. I guess we are returning the incorrect amount in "obligations". I don't like returning a bogus value and assuming that the user will re-code their application to catch a brand new field--nobody will do this until they get bitten. Can we instead put a "null" value in the relevant "obligations" field for each currency type?

@RichardAH
Copy link
Collaborator Author

@RichardAH Oh--I didn't read the results clearly. I guess we are returning the incorrect amount in "obligations". I don't like returning a bogus value and assuming that the user will re-code their application to catch a brand new field--nobody will do this until they get bitten. Can we instead put a "null" value in the relevant "obligations" field for each currency type?

Probably the most sensible value to place there is MAX_FLOAT. Or actually compute the value properly with an arbitrary precision floating point library and return this?
Returning NULL is probably not very useful information. But I agree with you that returning a wrong value is probably not much more useful.

@scottschurr
Copy link
Collaborator

I did a little exploring of this pull request. I've never looked in GatewayBalances before, so I may be misunderstanding what's going on. Please straighten me out if I've walked off the path.

Given the fix, it looks like we believe that adding a positive non-XRP STAmount to a different positive non-XRP STAmount causes the resulting sum to overflow the valid value for STAmount. That's just a guess, since the pull request has no unit tests.

If I got that right, then I think there are some things worth thinking through.

  1. The maximum valid value of a non-XRP STAmount is 9999999999999999e80. To throw an exception the sum must exceed that value.
  2. A non-XRP STAmount is a floating point like number. That means adding a small number to very large number may result in the very large number not changing at all. Consider this in STAmount arithmetic:
       9999999999999999e80 + 1000000000000000e64 == 9999999999999999e80
  1. Given that adding a non-zero value to an STAmount may not change that STAmount, we're already in the space where we are returning "incorrect amounts". And we have been for as long as we've been returning a sum of STAmount of widely varying sizes.
  2. Given that the returned value has never been accurate if the sum gets large, it may make sense to return the largest valid STAmount if there is an overflow exception. Returning that value is as accurate as we can make it using STAmount.

If the above steps make sense, then a possible fix might look like this:

                        try
                        {
                            bal -= rs->getBalance();
                        }
                        catch (std::runtime_error const&)
                        {
                            // Presumably the exception was caused by overflow.
                            // On overflow return the largest valid STAmount.
                            // Very large sums of STAmount are approximations
                            // anyway.
                            bal = STAmount(
                                bal.issue(),
                                STAmount::cMaxValue,
                                STAmount::cMaxOffset);
                        }

While I was working on this I wrote a unit test so I could validate some of these assumptions. The top-most commit on this branch includes the proposed fix as well as the unit test: https://github.com/scottschurr/rippled/commits/RichardAH-gateway_balances_overflow You're welcome to cherrypick that commit if you want to.

@scottschurr
Copy link
Collaborator

Flagging @godexsoft since this change may impact the behavior of clio.

@intelliot intelliot removed the request for review from cjcobb23 January 12, 2023 05:20
@intelliot
Copy link
Collaborator

@WietseWind @RichardAH please review the comment from Scott S above, at your convenience.

@RichardAH
Copy link
Collaborator Author

While I was working on this I wrote a unit test so I could validate some of these assumptions. The top-most commit on this branch includes the proposed fix as well as the unit test: https://github.com/scottschurr/rippled/commits/RichardAH-gateway_balances_overflow You're welcome to cherrypick that commit if you want to.

Thanks for this, I've cherry-picked it. LGTM

@intelliot intelliot assigned godexsoft and unassigned RichardAH Mar 15, 2023
Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Collaborator

@godexsoft godexsoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. We have added similar code in clio too.

@intelliot intelliot changed the title catch overflow exception in gateway_balances fix(gateway_balances): handle overflow exception Mar 16, 2023
@intelliot intelliot merged commit 10555fa into XRPLF:develop Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants