Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dgraph Crashes Randomly While Running In GCE #1020

Closed
willcj33 opened this issue Jun 6, 2017 · 3 comments · Fixed by #1029
Closed

Dgraph Crashes Randomly While Running In GCE #1020

willcj33 opened this issue Jun 6, 2017 · 3 comments · Fixed by #1029
Labels
kind/bug Something is broken.

Comments

@willcj33
Copy link

willcj33 commented Jun 6, 2017

panic: runtime error: index out of range

goroutine 244714 [running]:
github.com/dgraph-io/dgraph/algo.Difference(0xc44fa0e820, 0xc4534e41a0)
	/home/travis/gopath/src/github.com/dgraph-io/dgraph/algo/uidlist.go:242 +0x26d
github.com/dgraph-io/dgraph/query.ProcessGraph(0x1739f00, 0xc433e188d0, 0xc43cf2bb00, 0xc43cf2b200, 0xc44d294c60)
	/home/travis/gopath/src/github.com/dgraph-io/dgraph/query/query.go:1729 +0x1c93
created by github.com/dgraph-io/dgraph/query.ProcessGraph
	/home/travis/gopath/src/github.com/dgraph-io/dgraph/query/query.go:1701 +0x18ea
@manishrjain manishrjain added the kind/bug Something is broken. label Jun 6, 2017
@manishrjain
Copy link
Contributor

@tzdybal : Can you look into it immediately?

@tzdybal
Copy link
Contributor

tzdybal commented Jun 7, 2017

It seems that bug is caused by a race condition, when multiple filters are applied concurrently to the same subgraph. I'll try to reproduce this.

@willcj33 could you provide the problematic query?

@tzdybal
Copy link
Contributor

tzdybal commented Jun 8, 2017

Detailed problem description:
When multiple filters are used, filter expression tree is build.
for example filter: @filter(not(allofterms(name@en, "a")) and not(allofterms(name@en, "b")) and not(allofterms(name@en, "c"))) the three looks like this:

and                                                                                                 
|                                                                                                   
+-> not                                                                                             
|    |                                                                                              
|    +-> allofterms(name@en, "a")                                                                   
|                                                                                                   
+-> and                                                                                             
    |                                                                                               
    +-> not                                                                                         
    |    |                                                                                          
    |    +-> allofterms(name@en, "b")                                                               
    |                                                                                               
    +-> not                                                                                         
         |                                                                                          
         +-> allofterms(name@en, "c")

Partial results (UID lists) are shared by all filters. At each level filters are executed concurrently. The problem occurs when and and not from the same level are scheduled to execute in the same time. Intersection done by and may alter the UID list accessed by not operator, causing the crash.

To eliminate this data race, algorithms should work on separate copies of pointers to UID list (fortunately copying entire lists is not required).

tzdybal pushed a commit that referenced this issue Jun 8, 2017
tzdybal pushed a commit that referenced this issue Jun 9, 2017
tzdybal pushed a commit that referenced this issue Jun 11, 2017
In #1020 there was data race that resulted in the crash (where `and`
and `not` filters was executed simultaneously).
This commit removes data race between two `not` filters running in
parallel. It makes sure that results of each `algo.Difference`
execution are stored in separate (newly allocated) slice.
tzdybal pushed a commit that referenced this issue Jun 12, 2017
In #1020 there was data race that resulted in the crash (where `and`
and `not` filters was executed simultaneously).
This commit removes data race between two `not` filters running in
parallel. It makes sure that results of each `algo.Difference`
execution are stored in separate (newly allocated) slice.
tzdybal pushed a commit that referenced this issue Jun 12, 2017
In #1020 there was data race that resulted in the crash (where `and`
and `not` filters was executed simultaneously).
This commit removes data race between two `not` filters running in
parallel. It makes sure that results of each `algo.Difference`
execution are stored in separate (newly allocated) slice.
@manishrjain manishrjain added the kind/bug Something is broken. label Mar 21, 2018
@manishrjain manishrjain added the kind/bug Something is broken. label Mar 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken.
Development

Successfully merging a pull request may close this issue.

3 participants