Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(txn): de-duplicate the context keys and predicates #7478

Merged
merged 4 commits into from
Mar 4, 2021

Conversation

NamanJain8
Copy link
Contributor

@NamanJain8 NamanJain8 commented Feb 24, 2021

The zero raft loop got stuck because the proposals took a long time to process. Hence, the goroutine which checks the quorum gets struck leading to leadership change (even in single zero single alpha).

Update: The real issue was the Keys in txnContext. While merging the conflict keys, deduplication was not done. Do deduplication before sending the request to zero.

The issue is due to lock contention.
https://github.com/dgraph-io/dgraph/blob/c66e86985b642eb6b563014ed05389e7b3170f72/dgraph/cmd/zero/oracle.go#L132-L151 is holding the lock, while this function tries to send update https://github.com/dgraph-io/dgraph/blob/c66e86985b642eb6b563014ed05389e7b3170f72/dgraph/cmd/zero/oracle.go#L273-L283.

The processing of the keyCommit map is slow and the locks for o.commits and o.keyCommit can be separated by using sync.Map for o.commits.


This change is Reviewable

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the proposal application out of the Zero Raft loop, into an applyCh style application, like we do in Alpha. That would tackle the slowness issues.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @NamanJain8 and @vvbalaji-dgraph)

@NamanJain8 NamanJain8 force-pushed the naman/keycommit-lock branch from e978b65 to 0d391e6 Compare March 2, 2021 09:56
@NamanJain8 NamanJain8 requested a review from martinmr as a code owner March 2, 2021 09:56
@NamanJain8 NamanJain8 changed the title perf(raft): separate out the keyCommit lock from oracle lock fix(txn): use map instead of list in txnContext Mar 2, 2021
Copy link
Contributor Author

@NamanJain8 NamanJain8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed the real issue was with txnContext.Keys containing duplicates. So, we don't need that Raft loop change.

Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @martinmr, @NamanJain8, and @vvbalaji-dgraph)

@NamanJain8 NamanJain8 changed the title fix(txn): use map instead of list in txnContext fix(txn): de-duplicate the context keys and predicates Mar 2, 2021
@NamanJain8 NamanJain8 changed the title fix(txn): de-duplicate the context keys and predicates perf(txn): de-duplicate the context keys and predicates Mar 2, 2021
Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @NamanJain8 🎉

@NamanJain8 NamanJain8 merged commit 604cf63 into master Mar 4, 2021
@NamanJain8 NamanJain8 deleted the naman/keycommit-lock branch March 4, 2021 08:06
aman-bansal pushed a commit that referenced this pull request Mar 9, 2021
The zero raft loop got stuck because the proposals took a long time to process. Hence, the goroutine which checks the quorum gets struck leading to leadership change (even in single zero single alpha).

The real issue was the Keys in txnContext. While merging the conflict keys, deduplication was not done. Do deduplication before sending the request to zero.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants