Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt(rollup): change the way rollups are done #7253

Merged
merged 21 commits into from
Jan 11, 2021
Merged

Conversation

jarifibrahim
Copy link
Contributor

@jarifibrahim jarifibrahim commented Jan 6, 2021

Fixes DGRAPH-2608

ISSUE:
In a load that has a lot of updations to a key, a lot of deltas accumulate over time. These keys are added to the batch for a rollup. Due to the lossy behavior of rollups, they get dropped while the dgraph is busy rolling up keys (that might have much smaller delta count).

Fix:
This PR changes the way rollups are done.

  • We now add keys to rollupBatch just after pushing the deltas over to badger.
  • We now have a priority inside the keys that we rollup. The keys for which the deltas have increased up to a certain limit (hard-coded 500), we add it to high priority rollupKeyPool so that it has high chances to get rolled up and does not get dropped due to lossy behavior of rollups.

Results:
With a workload where a set of values are frequently updated:

  • On master, the p directory size roars up to 6-7GB within 7-8 hours, and the deltaCount increases to 7-9k for lots of keys.
  • On this branch, p directory size stays bounded in range 1-2GB even after >24 hours of workload.

This change is Reviewable

Copy link
Contributor Author

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @NamanJain8 is the disk size bounded after this change?

posting/mvcc.go Outdated Show resolved Hide resolved
@jarifibrahim
Copy link
Contributor Author

This PR needs a proper title and description.

@NamanJain8 NamanJain8 changed the title [Do not merge] chore(rollup): Add logs about rolling up the keys opt(rollup): change the way rollups are done Jan 11, 2021
posting/mvcc.go Outdated Show resolved Hide resolved
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Just some comments.

Reviewed 1 of 3 files at r5, 1 of 1 files at r6.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @ahsanbarkati, @jarifibrahim, @martinmr, and @vvbalaji-dgraph)


posting/mvcc.go, line 136 at r6 (raw file):

	doRollup := func(batch *[][]byte, priority int) {
		currTs := time.Now().Unix()

forcerollup is good to have. Use it over the priority pool.

Maybe add a TODO about dealing with opRollup not running.

@NamanJain8 NamanJain8 merged commit cbdc991 into master Jan 11, 2021
@NamanJain8 NamanJain8 deleted the ibrahim/delta-log branch January 11, 2021 18:46
@NamanJain8
Copy link
Contributor

LGTM. @NamanJain8 is the disk size bounded after this change?

Yes, it is. Mentioned that in PR description.

NamanJain8 pushed a commit that referenced this pull request Jan 12, 2021
ISSUE:
In a load that has a lot of updations to a key, a lot of deltas accumulate over time. These keys are added to the batch for a rollup. Due to the lossy behavior of rollups, they get dropped while the dgraph is busy rolling up keys (that might have much smaller delta count).

FIX:
We now add keys to rollupBatch just after pushing the deltas over to badger.
We now have a priority inside the keys that we rollup. The keys for which the deltas have increased up to a certain limit (hard-coded 500), we add it to high priority rollupKeyPool so that it has high chances to get rolled up and does not get dropped due to lossy behavior of rollups.

Co-authored-by: Manish R Jain <[email protected]>
Co-authored-by: NamanJain8 <[email protected]>
(cherry picked from commit cbdc991)
NamanJain8 added a commit that referenced this pull request Jan 15, 2021
ISSUE:
In a load that has a lot of updations to a key, a lot of deltas accumulate over time. These keys are added to the batch for a rollup. Due to the lossy behavior of rollups, they get dropped while the dgraph is busy rolling up keys (that might have much smaller delta count).

FIX:
We now add keys to rollupBatch just after pushing the deltas over to badger.
We now have a priority inside the keys that we rollup. The keys for which the deltas have increased up to a certain limit (hard-coded 500), we add it to high priority rollupKeyPool so that it has high chances to get rolled up and does not get dropped due to lossy behavior of rollups.

Co-authored-by: Manish R Jain <[email protected]>
Co-authored-by: NamanJain8 <[email protected]>
(cherry picked from commit cbdc991)

Co-authored-by: Ibrahim Jarif <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants