Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't suspend writes based on batch size. Instead, log if it seems to be too big #4762

Closed
wants to merge 1 commit into from

Conversation

mtrippled
Copy link
Collaborator

@mtrippled mtrippled commented Oct 12, 2023

High Level Overview of Change

Removes a limit that blocks nodestore writes. This delays consensus because each new ledger is persisted during consensus processing.

Context of Change

This removes the check that limits write batch size and blocks when the limit is reached. It also adds logging to indicate when the limit is exceeded. The trade-off is that the pending writes consume memory until they are persisted, so memory can increase without the limit. But since blocking writes result in consensus instability, the existing behavior is a proactive pessimization. Ultimately, pending writes either increase due to increased input (transaction volume), or because of I/O slowdown. If the IO and memory subsystem can't handle the throughput then the process will be unstable regardless.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Tests (You added tests for code that already exists, or your new feature included in this PR)
  • Documentation Updates
  • Performance
  • Release

@intelliot intelliot changed the title Don't suspend writes based on batch size. Instead, Don't suspend writes based on batch size. Instead, log if it seems to be too big Oct 12, 2023
@intelliot intelliot added this to the TPS milestone Oct 12, 2023
@HowardHinnant
Copy link
Contributor

This looks fine to me, but I have a question: It looks like the code just adds logging. I don't see code that does the other part of the title: "Don't suspend writes based on batch size." What am I missing?

{
std::unique_lock<decltype(mWriteMutex)> sl(mWriteMutex);

// If the batch has reached its limit, we wait
// until the batch writer is finished
while (mWriteSet.size() >= batchWriteLimitSize)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HowardHinnant Here's the logic change.

@mtrippled
Copy link
Collaborator Author

This looks fine to me, but I have a question: It looks like the code just adds logging. I don't see code that does the other part of the title: "Don't suspend writes based on batch size." What am I missing?

fixed

Copy link
Collaborator

@seelabs seelabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This patch fixes a potential deadlock as well. Needs to be reformatted before merging though. Nice job Mark!

@mtrippled mtrippled added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Nov 22, 2023
@intelliot intelliot modified the milestones: TPS, 2.0.1 Nov 29, 2023
@intelliot intelliot self-assigned this Jan 3, 2024
@intelliot intelliot added Perf Attn Needed Attention needed from RippleX Performance Team and removed Performance/Resource Improvement labels Jan 8, 2024
@intelliot intelliot requested a review from sophiax851 January 8, 2024 22:38
@intelliot intelliot assigned sophiax851 and unassigned intelliot Jan 8, 2024
@intelliot
Copy link
Collaborator

@sophiax851 this may require perf signoff

@intelliot
Copy link
Collaborator

Internal tracker: RPFC-80

@intelliot
Copy link
Collaborator

May be replaced by #4882

@intelliot
Copy link
Collaborator

intelliot commented Jan 18, 2024

  • What happens when we want to write faster than NuDB is able to write? We have to do something to slow down the write rate.
  • Could we just slow down the writes since there is no rush?
  • Yes, but we don't want to block all threads writing to NuDB. We could add a priority and make online_delete low priority. (That is the only thing that really generates surges of writes; also consider: fetching ledger history; closing a new ledger is a very small surge)
    • If buffer is somewhat full, pause the low priority writes.
    • Need to expose the "progress" of online_delete since it will take more time.
  • Need to bound memory use so it's not infinite. Current approach allows writes to grow at a certain rate and if writes exceed the rate, we slow down all code that tries to write. It would slow down online_delete which is something we should do; but that would also slow down high-priority code which is advancing the ledger.
  • NuDB tracks its own bytes/sec or objects/sec write rate and can tell if writes are coming in faster than that. It can measure what % of the server's capacity is being used. What should we do with that information? We can slow something down, but what?

@intelliot
Copy link
Collaborator

#4503 writes asynchronously to nudb with a background thread, but it has an upper limit of records that can be in the queue to be written. When reached, writes are blocked. Online deletion copies a ledger and ensures that it writes the entire ledger to nudb. This amount of data being written likely causes further writes to be blocked until the data is written. This includes persisting new records for each new ledger produced, which is synchronous for consensus. This causes desyncs. This PR (#4762) is expected to fix this.

If #4882 (which just reverts #4503) is merged, then #4503+#4762 (combined) can still be considered for testing+merging in the future.

@intelliot intelliot modified the milestones: 2.0.1, TPS Jan 19, 2024
@intelliot
Copy link
Collaborator

note via @mtrippled : there is no reason to merge this PR. We can consider #4503 (comment) - tweaking NuDB's write buffer is probably better design. We can design micro-benchmarks to show how it affects NuDB itself. Write prioritization is too much complexity, but if NuDB could write faster, then that was the whole point of #4503.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. Perf Attn Needed Attention needed from RippleX Performance Team
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants