Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(core): Fix performance issue in type filter #9065

Merged
merged 7 commits into from
Apr 12, 2024

Conversation

harshil-goel
Copy link
Contributor

@harshil-goel harshil-goel commented Apr 5, 2024

Currently when we do queries like func(uid: 0x1) @filter(type). We retrieve the entire type index. Sometimes, when the index is too big, fetching the index is quite slow. We realised that if we know we only want to check few uids are of the same, then we can just check those uids directly. Right now we are hard coding the number of uids threshold. This could be improved with a more statistical based model, where we figure out how many items does the type index have, how many we need to check.

@dgraph-bot dgraph-bot added area/core internal mechanisms go Pull requests that update Go code labels Apr 5, 2024
@harshil-goel harshil-goel force-pushed the harshil-goel/fix-type-perf branch 2 times, most recently from b31cea7 to 8375b71 Compare April 10, 2024 09:47
damonfeldman
damonfeldman previously approved these changes Apr 12, 2024
posting/list.go Outdated
numNormalPostingsRead := 0
defer func() {
if numNormalPostingsRead < numDeletePostingsRead {
glog.V(3).Infof("During iterate on posting list, we read %d set postings, %d delete postings"+

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify this message in some way to be more useful to someone who is not familiar with internal workings? Also include which posting list if that is available.

I think this means that the badger values representing a posting list (in 256KB chunks IIRC) contained many deleted structures, and are doing more than 50% of the data movement for non-useful deleted data.

If so something like: "High proportion of deleted data observed for posting list {l.key}: total = {numNormal+numdeleted}, percent deleted = {numDel / (numnormal+numdel) * 100}%".

@@ -48,7 +48,7 @@ const (
`client_key=; sasl-mechanism=PLAIN; tls=false;`
LimitDefaults = `mutations=allow; query-edge=1000000; normalize-node=10000; ` +
`mutations-nquad=1000000; disallow-drop=false; query-timeout=0ms; txn-abort-after=5m; ` +
` max-retries=10;max-pending-queries=10000;shared-instance=false`
` max-retries=10;max-pending-queries=10000;shared-instance=false;type-filter-uid-limit=10`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this... If a customer is large enough to have performance concerns, my guess is the dgraph.type index is going to be very large (or include some very large index entries that drive the performance profile). In that case, perhaps we should optimize for this case, and consider what uid-limit will balance performance for 1M or more UIDs? I suspect that will be more like 100 or more.

Maybe not something to delay/retest for, but consider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The number 10 is too small for such optimization. But we can defer this until we see another case and get some validation of type-filter-uid-limit.

worker/task.go Show resolved Hide resolved
Copy link
Contributor

@meghalims meghalims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@harshil-goel harshil-goel merged commit 724e4db into release/v23.1 Apr 12, 2024
10 checks passed
@harshil-goel harshil-goel deleted the harshil-goel/fix-type-perf branch April 12, 2024 18:57
harshil-goel added a commit that referenced this pull request May 15, 2024
Currently when we do queries like `func(uid: 0x1) @filter(type)`. We
retrieve the entire type index. Sometimes, when the index is too big,
fetching the index is quite slow. We realised that if we know we only
want to check few `uids` are of the same, then we can just check those
`uids` directly. Right now we are hard coding the number of `uids`
threshold. This could be improved with a more statistical based model,
where we figure out how many items does the type index have, how many we
need to check.
harshil-goel added a commit that referenced this pull request May 15, 2024
Currently when we do queries like `func(uid: 0x1) @filter(type)`. We
retrieve the entire type index. Sometimes, when the index is too big,
fetching the index is quite slow. We realised that if we know we only
want to check few `uids` are of the same, then we can just check those
`uids` directly. Right now we are hard coding the number of `uids`
threshold. This could be improved with a more statistical based model,
where we figure out how many items does the type index have, how many we
need to check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core internal mechanisms go Pull requests that update Go code
Development

Successfully merging this pull request may close these issues.

5 participants