-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread Safety #62
Comments
We don't do anything special, so I assume the underlying solver is not thread safe. |
I did a quick run with GDB and I get the segfault here. But maybe that's not the actual point of segfault and Julia just happened to GC when this happened. Here's the GDB backtrace: Thread 3 "julia" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe2f3b700 (LWP 199218)]
0x00007ffff752c3e3 in jl_mutex_wait (lock=0x7ffff7888920 <codegen_lock>, safepoint=1)
at /home/lassepe/worktree/julia/src/julia_locks.h:34
34 jl_gc_safepoint_(self->ptls);
(gdb) bt
#0 0x00007ffff752c3e3 in jl_mutex_wait (lock=0x7ffff7888920 <codegen_lock>, safepoint=1)
at /home/lassepe/worktree/julia/src/julia_locks.h:34
#1 0x00007ffff752c536 in jl_mutex_lock (lock=0x7ffff7888920 <codegen_lock>)
at /home/lassepe/worktree/julia/src/julia_locks.h:85
#2 0x00007ffff752db6d in jl_generate_fptr (mi=0x7ffff0a17350, world=31344)
at /home/lassepe/worktree/julia/src/jitlayers.cpp:316
#3 0x00007ffff7481916 in jl_compile_method_internal (mi=0x7ffff0a17350, world=31344)
at /home/lassepe/worktree/julia/src/gf.c:1980
#4 0x00007ffff748275e in _jl_invoke (F=0x7ffff086f1b0, args=0x7fffef1f3408, nargs=0, mfunc=0x7ffff0a17350, world=31344)
at /home/lassepe/worktree/julia/src/gf.c:2239
#5 0x00007ffff7483185 in jl_apply_generic (F=0x7ffff086f1b0, args=0x7fffef1f3408, nargs=0)
at /home/lassepe/worktree/julia/src/gf.c:2429
#6 0x00007ffff74a8cdc in jl_apply (args=0x7fffef1f3400, nargs=1) at /home/lassepe/worktree/julia/src/julia.h:1788
#7 0x00007ffff74aac20 in start_task () at /home/lassepe/worktree/julia/src/task.c:877 |
That just means the issue is probably in us cleaning up an instance of Path while another is running, and that the two instances share some global state in C. If I recall, there's a workspace feature of the C library that we aren't using it at the moment. But not sure if that would help. |
Thank you for your swift response. I dug a little deeper and it indeed seems like this is a global state issue inside PATH. At least the function signature of the solver suggests that there is only one global solver object. FUN_DECL(Void) Path_Create(Int maxSize, Int maxNNZ) PATHSolver.jl never calls that directly but I guess that would be created as a result of the call to |
Yes. We can probably fix this by manually creating and managing our own workspace, but I don't know what this entails. You'd have to look into the C code. I'll review a PR if you have time to work on it 😄 You can find the C header files here: http://www.cs.wisc.edu/~ferris/path/path_5.0.05_Linux64.zip |
A colleague of mine contacted the PATH team and Steven Dirkse replied:
Therefore, I guess there's not much we can do on this end for now. A note on this issue in the README or somewhere in the docs may be the best solution for now. |
Okay, I'll add a note. Thanks for digging into this further! |
I'll also say that threading might be broken, but you can still use process-based parallelism: import Distributed
Distributed.addprocs(4)
Distributed.@everywhere begin
import Pkg
Pkg.activate("/tmp/path") # Or wherever your environment is
end
Distributed.@everywhere begin
import PATHSolver
import SparseArrays
function solve_example(i)
println("Solving i=$i on node $(Distributed.myid())")
M = convert(
SparseArrays.SparseMatrixCSC{Cdouble,Cint},
SparseArrays.sparse([
0 0 -1 -1
0 0 1 -2
1 -1 2 -2
1 2 -2 4
]),
)
return PATHSolver.solve_mcp(
M,
Cdouble[2, 2, -2, -6],
fill(0.0, 4),
fill(10.0, 4),
Cdouble[0.0, 0.0, 0.0, 0.0],
output = "no",
)
end
end
results = Distributed.pmap(solve_example, 1:10) |
Right, multi-processing works because each process loads the lib separately. Unfortunately, in my current usecase the problems are very small so multi-processing adds too much overhead to provide an overall speedup. Still good to keep in mind for other settings though. Should I close this issue since it's not real actionable or do you want to keep it around as additional warning/reminder? |
I'll close it when I merge #63 |
When I call the PATH solver from multiple threads at the same time, I reliably run into a segfault. Is there a workaround to avoid this?
Here is an MWE (make sure to start the Julia REPL with multiple threads):
The text was updated successfully, but these errors were encountered: