-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Periodically pause copying ledger nodes during online_delete #4907
base: develop
Are you sure you want to change the base?
Conversation
* Gives other processes (notable ledger persistence during consensus) more time to complete their writes. * The period is set as half the node store batch write limit size, so that there is plenty of room for other processes. * Reuses the `back_off_milliseconds` configuration value, which is used for other database delays.
Internal tracker: https://ripplelabs.atlassian.net/browse/RPFC-107 |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #4907 +/- ##
=======================================
Coverage 77.9% 77.9%
=======================================
Files 783 783
Lines 66677 66682 +5
Branches 8108 8106 -2
=======================================
+ Hits 51921 51929 +8
+ Misses 14756 14753 -3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM
* upstream/develop: Set version to 2.1.0-rc1 `fixInnerObjTemplate`: set inner object template (4906) feat: allow `port_grpc` to be specified in `[server]` stanza (4728) build: add headers needed in Conan package for libxrpl (4885) `fixNFTokenReserve`: ensure NFT tx fails when reserve is not met (4767) fix(libxrpl): change library names in Conan recipe (4831) test: check for success/failure of Windows CI unit tests (4871)
* upstream/develop: Set version to 2.1.0 test: guarantee proper lifetime for temporary Rules object: (4917)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM!
* upstream/develop: fix compile error on gcc 13: (4932) Price Oracle (XLS-47d): (4789) (4789)
* upstream/develop: Set version to 2.2.0-b1
* upstream/develop: build: add STCurrency.h to xrpl_core to fix clio build (4939) feat: add user version of `feature` RPC (4781) Fast base58 codec: (4327) Remove default ctors from SecretKey and PublicKey: (4607)
* upstream/develop: Update remaining actions (4949) Upgrade to xxhash 0.8.2 as a Conan requirement, enable SIMD hashing (4893) Fix workflows (4948) fix: order book update variable swap: (4890) Embed patched recipe for RocksDB 6.29.5 (4947)
* upstream/develop: Install more public headers (4940)
* upstream/develop: test: Env unit test RPC errors return a unique result: (4877)
* upstream/develop: fixXChainRewardRounding: round reward shares down: (4933) Don't reach consensus as quickly if no other proposals seen: (4763) Write improved `forAllApiVersions` used in NetworkOPs (4833) Remove zaphod.alloy.ee hub from default server list: (4903) Enforce no duplicate slots from incoming connections: (4944) `fixEmptyDID`: fix amendment to handle empty DID edge case: (4950) perf: improve `account_tx` SQL query: (4955) Fix workflows (4951)
* upstream/develop: chore: change Github Action triggers for build/test jobs (4956)
* upstream/develop: Set version to 2.1.1 fix: improper handling of large synthetic AMM offers:
* upstream/develop: chore: Default validator-keys-tool to master branch: (4943)
* upstream/develop: Set version to 2.2.0-b2 fix Conan component reference typo Remove unused lambdas from MultiApiJson_test
* Reduce the copy backoff interval from 32k to 128 * Increase the backoff time from 100ms to 2s
* upstream/develop: fix: resolve database deadlock: (4989) test: verify the rounding behavior of equal-asset AMM deposits (4982) test: Add tests to raise coverage of AMM (4971) chore: Improve codecov coverage reporting (4977) test: Unit test for AMM offer overflow (4986) fix amendment to add `PreviousTxnID`/`PreviousTxnLgrSequence` (4751)
* upstream/develop: Add xrpld build option and Conan package test (5052)
* upstream/develop: Update BUILD.md after PR 5052 (5067)
* upstream/develop: chore: Add comments to SignerEntries.h (5059) chore: Rename two files from Directory* to Dir*: (5058)
* upstream/develop: Ensure levelization sorting is ASCII-order across platforms (5072) fix: Fix NuDB build error via Conan patch (5061) Disallow filtering account_objects by unsupported types (5056)
* upstream/develop: Update gcovr EXCLUDE (5084) Fix crash inside `OverlayImpl` loops over `ids_` (5071) Set version to 2.3.0-b2 docs: Document the process for merging pull requests (5010) Remove unused constants from resource/Fees.h (4856) fix: change error for invalid `feature` param in `feature` RPC (5063) Set version to 2.2.1 Use error codes throughout fast Base58 implementation Improve error handling in some RPC commands
* upstream/develop: Factor out Transactor::trapTransaction (5087) Remove shards (5066)
* upstream/develop: Address rare corruption of NFTokenPage linked list (4945)
* upstream/develop: chore: Fix documentation generation job: (5091) chore: libxrpl verification on CI (5028)
* upstream/develop: docs: Update options documentation (5083) refactor: Remove dead headers (5081) refactor: Remove reporting mode (5092)
* upstream/develop: Update Release Notes for 2.2.1 and 2.2.2 Set version to 2.2.2 Allow only 1 job queue slot for each validation ledger check Allow only 1 job queue slot for acquiring inbound ledger. Track latencies of certain code blocks, and log if they take too long
* upstream/develop: test: Retry RPC commands to try to fix MacOS CI jobs (5120)
* upstream/develop: Set version to 2.3.0-b4 feat(SQLite): allow configurable database pragma values (5135) refactor: re-order PRAGMA statements (5140) fix(book_changes): add "validated" field and reduce RPC latency (5096) chore: fix typos in comments (5094) Set version to 2.2.3 Update SQLite3 max_page_count to match current defaults (5114)
Merge remote-tracking branch 'upstream/develop' into shamapstore-backoff * upstream/develop: Expand Error Message for rpcInternal (4959) docs: clean up API-CHANGELOG.md (5064)
* upstream/develop: Consolidate definitions of fields, objects, transactions, and features (5122) Ignore reformat when blaming Reformat code with clang-format-18 Update pre-commit hook Update clang-format settings Update clang-format workflow
* upstream/develop: docs: Add protobuf dependencies to linux setup instructions (5156) fix: reject invalid markers in account_objects RPC calls (5046) Update RELEASENOTES.md (5154) Introduce MPT support (XLS-33d): (5143)
Even though this has two approvals, I'm changing it back to draft because early perf tests did not show any change, and there may be a better / different solution. |
* upstream/develop: Add hubs.xrpkuwait.com to bootstrap (5169)
* upstream/develop: Add AMMClawback Transaction (XLS-0073d) (5142)
* upstream/develop: Fix unity build (5179)
* upstream/develop: Set version to 2.3.0-rc1 Replace Uint192 with Hash192 in server_definitions response (5177) Fix potential deadlock (5124) Introduce Credentials support (XLS-70d): (5103) Fix token comparison in Payment (5172) Add fixAMMv1_2 amendment (5176)
* upstream/develop: fix: include `index` in `server_definitions` RPC (5190) Fix ledger_entry crash on invalid credentials request (5189)
* upstream/develop: Set version to 2.3.0-rc2
* upstream/develop: Set version to 2.3.0 refactor(AMMClawback): move tfClawTwoAssets check (5201) Add a new serialized type: STNumber (5121) fix: check for valid ammID field in amm_info RPC (5188)
* upstream/develop: test: Add more test cases for Base58 parser (5174) test: Check for some unlikely null dereferences in tests (5004) Add Antithesis intrumentation (5042)
* upstream/develop: refactor: clean up `LedgerEntry.cpp` (5199)
* upstream/develop: Enforce levelization in libxrpl with CMake (5111)
* upstream/develop: Set version to 2.4.0-b1 fix: Add header for set_difference (5197) fix: allow overlapping types in `Expected` (5218) Add MPTIssue to STIssue (5200) Antithesis instrumentation improvements (5213)
High Level Overview of Change
Mitigates disk write contention while old ledgers are being deleted, and specifically while the full ledger is being copied to the "new" node store.
Context of Change
PR #4503 (reverted by #4882) attempted to improve
rippled
performance by writing batches to NuDB asynchronously. However, it had an unintended side effect that whenonline_delete
writes the entire ledger to disk, it tends to cause the buffer to fill up, which results in blocking new ledgers from being persisted.Type of Change
Before / After
Cost: This change will cause
online_delete
to take significantly longer to copy the full ledger from the old node store to the new one, which is the last significant step in the process.Benefit:
rippled
should be much less likely to desync, and put less load on the disk duringonline_delete
.online_delete
is still a demanding process, so this won't be a panacea, but it should be a significant improvment. (Significant improvement to be measured.)Performance details
online_delete
and writing new ledgers.rippled
should struggle less duringonline_delete
to stay synced, and other functions.rippled
should put less strain/demand on the disk duringonline_delete
.online_delete
.Note that
back_off_milliseconds
is configurable, defaulting to 100. Node operators can de-prioritizeonline_delete
operations more by increasing this value to whatever they are comfortable with.