Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmprof causes process to hang #116

Closed
jogo opened this issue Feb 16, 2017 · 9 comments
Closed

vmprof causes process to hang #116

jogo opened this issue Feb 16, 2017 · 9 comments

Comments

@jogo
Copy link
Contributor

jogo commented Feb 16, 2017

When enabling vmprof in a large gevent based server process, it outputs data for a few minutes then causes everything to hang.

when attaching gdb to the process and looking at the backtrace I get the following (confirmed I saw the same trace twice):

#1  0x000000000054a8f6 in find_key.47505 ()
#2  0x000000000054ad3a in PyGILState_GetThisThreadState ()
#3  0x00002b268fa19574 in sigprof_handler (sig_nr=<optimized out>, info=<optimized out>, ucontext=<optimized out>) at src/vmprof_main.h:170
#4  <signal handler called>
#5  0x000000000054a95d in find_key.47505 ()
#6  0x00000000004f3f30 in PyGILState_Release ()
#7  0x00002b26913f0f97 in ev_invoke_pending (loop=0x2b2691644620) at libev/ev.c:3155
#8  0x00002b26914206f8 in ev_run (loop=0x2b2691644620, flags=flags@entry=0) at libev/ev.c:3555
#9  0x00002b2691420fa4 in __pyx_pf_6gevent_8corecext_4loop_14run (__pyx_v_self=0x3abcc20, __pyx_v_once=False, __pyx_v_nowait=<optimized out>)
    at gevent/gevent.corecext.c:7017
#10 __pyx_pw_6gevent_8corecext_4loop_15run (__pyx_v_self=<gevent.corecext.loop at remote 0x3abcc20>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>)
    at gevent/gevent.corecext.c:6884
#11 0x0000000000536f94 in PyEval_EvalFrameEx ()
#12 0x00002b268fa189da in vmprof_eval (f=<optimized out>, throwflag=<optimized out>) at src/_vmprof.c:55
#13 0x000000000053e2b0 in PyEval_EvalCodeEx ()
#14 0x000000000051d0a0 in function_call ()
#15 0x00000000004c2646 in PyObject_Call ()
#16 0x0000000000531dea in instancemethod_call.8548 ()
#17 0x00000000004c2646 in PyObject_Call ()
#18 0x00000000004c2aa6 in PyEval_CallObjectWithKeywords ()
#19 0x00002b26911e0b2d in g_initialstub (mark=<optimized out>) at greenlet.c:798
#20 0x00002b26911e0480 in g_switch (target=0x3629370, args=(), kwargs=0x0) at greenlet.c:570
#21 0x00000000004b39f7 in PyObject_CallFunctionObjArgs ()
#22 0x00007ffdde724f61 in ?? ()
#23 0x0000000000000008 in ?? ()
#24 0x000000000050b5a7 in PyString_FromString ()
#25 0x00000000004ef3b4 in import_submodule.39103 ()
#26 0x000000001212c3b0 in ?? ()
#27 0x0000000012132990 in ?? ()
#28 0x000000000051d030 in ?? ()
#29 0x000000001214bb18 in ?? ()
#30 0x0000000012197ae8 in ?? ()
#31 0x0000000000000001 in ?? ()
#32 0x0000000012197b50 in ?? ()
#33 0x00007ffdde726540 in ?? ()
#34 0x000000000051d0a0 in function_call ()
#35 0x00000000004c2646 in PyObject_Call ()
#36 0x0000000000531e06 in instancemethod_call.8548 ()
#37 0x00007ffdde725360 in ?? ()
#38 0x00007ffdde725290 in ?? ()
#39 0x0000000000553507 in tupledealloc.24200 ()
#40 0x000000000089f3e0 in ?? ()
#41 0x00007ffdde7275d0 in ?? ()
#42 0x000000000089f3e0 in ?? ()
#43 0x000000001d0652b0 in ?? ()
#44 0x0000000001fe17f0 in ?? ()
#45 0x0000000000000001 in ?? ()
#46 0x0000000000561ed2 in PyErr_Format ()
#47 0x0000000000484356 in find_module.38892 ()
#48 0x00000000004ef3b4 in import_submodule.39103 ()
#49 0x000000001d1565cf in ?? ()
#50 0x00007ffdde7286b8 in ?? ()
#51 0x000000000050ed81 in PyString_FromFormatV ()
#52 0x0000000000561ed2 in PyErr_Format ()
#53 0x00000000008a9860 in ?? ()
#54 0x00000000ffffffff in ?? ()
#55 0x00000000ffffffff in ?? ()
#56 0x0000000002d6fc36 in ?? ()
#57 0x000000001d2613c0 in ?? ()
#58 0x00000000008a9860 in ?? ()
#59 0x00000000ffffffff in ?? ()
#60 0x00007ffdde728810 in ?? ()
#61 0x00000000008a9860 in ?? ()
#62 0x000000001d2f48dc in ?? ()
#63 0x00000000004efd54 in import_module_level.isra.3.39129 ()
#64 0x0000000000000000 in ?? ()

I am not sure what other information would be useful to debug this issue.

@jogo
Copy link
Contributor Author

jogo commented Feb 16, 2017

Besides this issue, I have found vmprof to be very useful and look forward to using it more in the future, thanks for such as great tool!

@planrich
Copy link
Contributor

Can you post some more information about your system? If have seen this issue once, but could not reproduce it. PyGILState_GetThisThreadState seems not to be able to acquire the lock and hangs forever.

@jogo
Copy link
Contributor Author

jogo commented Feb 16, 2017

Python version: python 2.7.3
OS: Ubuntu 12.04 with 4.2.0 kernel
gevent version: 1.1.2
server: based on http://www.gevent.org/gevent.pywsgi.html

I am trying to get our system running under python-dbg so I can hopefully get more information about about what is running when it hangs forever. After that I can try to create a self contained code base that can reliably cause the issue.

@planrich is there any other information that you want?

@jogo
Copy link
Contributor Author

jogo commented Feb 16, 2017

got python with debug information working:

(gdb) bt
#0  0x00002ba98b6e8fd0 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000004f3238 in PyThread_acquire_lock (lock=0x17a8460, waitflag=1) at Python/thread_pthread.h:321
#2  0x00000000004f340c in find_key (key=1, value=0x0) at Python/thread.c:268
#3  0x00000000004f363e in PyThread_get_key_value (key=1) at Python/thread.c:360
#4  0x00000000004dca78 in PyGILState_GetThisThreadState () at Python/pystate.c:581
#5  0x00002ba9b3bf8574 in sigprof_handler (sig_nr=<optimized out>, info=<optimized out>, ucontext=<optimized out>) at src/vmprof_main.h:170
#6  <signal handler called>
#7  PyThread_release_lock (lock=0x17a8460) at Python/thread_pthread.h:342
#8  0x00000000004f3516 in find_key (key=1, value=0x0) at Python/thread.c:296
#9  0x00000000004f363e in PyThread_get_key_value (key=1) at Python/thread.c:360
#10 0x00000000004dcb56 in PyGILState_Release (oldstate=PyGILState_UNLOCKED) at Python/pystate.c:622
#11 0x00002ba98e59af97 in ev_invoke_pending (loop=0x2ba98e7ee620) at libev/ev.c:3155
#12 0x00002ba98e5ca6f8 in ev_run (loop=0x2ba98e7ee620, flags=flags@entry=0) at libev/ev.c:3555
#13 0x00002ba98e5cafa4 in __pyx_pf_6gevent_8corecext_4loop_14run (__pyx_v_self=0x2c6af80, __pyx_v_once=False, __pyx_v_nowait=<optimized out>) at gevent/gevent.corecext.c:7017
#14 __pyx_pw_6gevent_8corecext_4loop_15run (__pyx_v_self=<gevent.corecext.loop at remote 0x2c6af80>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at gevent/gevent.corecext.c:6884
#15 0x000000000053241b in PyCFunction_Call (func=<built-in method run of gevent.corecext.loop object at remote 0x2c6af80>, arg=(), kw=0x0) at Objects/methodobject.c:85
#16 0x00000000004b491c in call_function (pp_stack=0x7ffc3fd584e8, oparg=0) at Python/ceval.c:4021
#17 0x00000000004b0d6c in PyEval_EvalFrameEx (f=
    Frame 0x2e298b0, for file /usr/local/lib/python2.7/site-packages/gevent/hub.py, line 667, in run (self=<PatchedHub(_resolver=<Resolver(pool=<ThreadPool(hub=<...>, _lock=<thread.lock at remote 0x2c5c590>, pid=15388, fork_watcher=<gevent.corecext.fork at remote 0x2cc6b48>, manager=None, task_queue=<Queue(unfinished_tasks=0, queue=<collections.deque at remote 0x2d2c910>, maxsize=0, all_tasks_done=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39e50>, mutex=<thread.lock at remote 0x2c5ccb0>, not_full=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39d50>, not_empty=<Condition(release=<built-in method relea...(truncated), throwflag=0) at Python/ceval.c:2666
#18 0x00000000004b29f9 in PyEval_EvalCodeEx (co=0x2ba98b671c30, globals=
    {'get_hub_class': <function at remote 0x2ba98b67d7d0>, '_NONE': <_NONE at remote 0x2ba98b52c0b0>, '_DEFAULT_LOOP_CLASS': 'gevent.core.loop', 'xrange': <type at remote 0x7b5420>, 'PY2': True, 'PY3': False, 'MAIN_THREAD': 48007186633408, 'absolute_import': <_Feature(mandatory=(2, 7, 0, 'alpha', 0), optional=(2, 5, 0, 'alpha', 1), compiler_flag=16384) at remote 0x2ba98b6643f8>, 'resolver_config': <function at remote 0x2ba98b67daa0>, 'threadlocal': <type at remote 0x7d27e0>, 'InvalidSwitchError': <type at remote 0x18d2380>, 'sleep': <function at remote 0x2ba98b67d320>, 'reraise': <function at remote 0x2ba98b67d140>, 'set_hub': <function at remote 0x2ba98b67d938>, 'LoopExit': <type at remote 0x18d1640>, '_MultipleWaiter': <type at remote 0x18ddf90>, 'GreenletExit': <type at remote 0x18cf570>, '__all__': ['getcurrent', 'GreenletExit', 'spawn_raw', 'sleep', 'kill', 'signal', 'reinit', 'get_hub', 'Hub', 'Waiter'], '_get_hub': <function at remote 0x2ba98b67d8c0>, '__package__': None, 'Waiter': <type at remote 0x18dce7...(truncated), locals=0x0, args=0x272a568, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#19 0x000000000052ff29 in function_call (func=<function at remote 0x2ba98b6811b8>, arg=
    (<PatchedHub(_resolver=<Resolver(pool=<ThreadPool(hub=<...>, _lock=<thread.lock at remote 0x2c5c590>, pid=15388, fork_watcher=<gevent.corecext.fork at remote 0x2cc6b48>, manager=None, task_queue=<Queue(unfinished_tasks=0, queue=<collections.deque at remote 0x2d2c910>, maxsize=0, all_tasks_done=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock o
---Type <return> to continue, or q <return> to quit---
bject at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39e50>, mutex=<thread.lock at remote 0x2c5ccb0>, not_full=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39d50>, not_empty=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acq...(truncated), kw=0x0) at Objects/funcobject.c:526
#20 0x000000000041f1d3 in PyObject_Call (func=<function at remote 0x2ba98b6811b8>, arg=
    (<PatchedHub(_resolver=<Resolver(pool=<ThreadPool(hub=<...>, _lock=<thread.lock at remote 0x2c5c590>, pid=15388, fork_watcher=<gevent.corecext.fork at remote 0x2cc6b48>, manager=None, task_queue=<Queue(unfinished_tasks=0, queue=<collections.deque at remote 0x2d2c910>, maxsize=0, all_tasks_done=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39e50>, mutex=<thread.lock at remote 0x2c5ccb0>, not_full=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39d50>, not_empty=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acq...(truncated), kw=0x0) at Objects/abstract.c:2529
#21 0x0000000000427b82 in instancemethod_call (func=<function at remote 0x2ba98b6811b8>, arg=
    (<PatchedHub(_resolver=<Resolver(pool=<ThreadPool(hub=<...>, _lock=<thread.lock at remote 0x2c5c590>, pid=15388, fork_watcher=<gevent.corecext.fork at remote 0x2cc6b48>, manager=None, task_queue=<Queue(unfinished_tasks=0, queue=<collections.deque at remote 0x2d2c910>, maxsize=0, all_tasks_done=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39e50>, mutex=<thread.lock at remote 0x2c5ccb0>, not_full=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39d50>, not_empty=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acq...(truncated), kw=0x0) at Objects/classobject.c:2578
#22 0x000000000041f1d3 in PyObject_Call (func=<instancemethod at remote 0x2ccc1e0>, arg=(), kw=0x0) at Objects/abstract.c:2529
#23 0x00000000004b41d9 in PyEval_CallObjectWithKeywords (func=<instancemethod at remote 0x2ccc1e0>, arg=(), kw=0x0) at Python/ceval.c:3890
#24 0x00002ba98e38a9fd in g_initialstub (mark=<optimized out>) at greenlet.c:798
#25 0x00002ba98e38a360 in g_switch (target=0x2cb87d0, args=(), kwargs=0x0) at greenlet.c:570
#26 0x000000001bf2dc70 in ?? ()
#27 0x00000000007dd910 in ?? ()
#28 0x0000000000464461 in string_item (a=0x1bfb37d8, i=24806496) at Objects/stringobject.c:1184
#29 0x000000000052887f in classmethod_get (descr=0x1bf2dc70, obj=<unknown at remote 0x1bfb37d8>, type=<unknown at remote 0x2007dd910>) at Objects/descrobject.c:116
#30 0x0000000000000001 in ?? ()
#31 0x00007ffc3fd58ca0 in ?? ()
#32 0x00000000004547a9 in PyObject_RichCompareBool (v=<unknown at remote 0x48ce894800000000>, w=<unknown at remote 0x-4527b274b7dfbf75>, op=1223181707) at Objects/object.c:999
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

(gdb) py-bt
#17 Frame 0x2e298b0, for file /usr/local/lib/python2.7/site-packages/gevent/hub.py, line 667, in run (self=<PatchedHub(_resolver=<Resolver(pool=<ThreadPool(hub=<...>, _lock=<thread.lock at remote 0x2c5c590>, pid=15388, fork_watcher=<gevent.corecext.fork at remote 0x2cc6b48>, manager=None, task_queue=<Queue(unfinished_tasks=0, queue=<collections.deque at remote 0x2d2c910>, maxsize=0, all_tasks_done=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39e50>, mutex=<thread.lock at remote 0x2c5ccb0>, not_full=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39d50>, not_empty=<Condition(release=<built-in method relea...(truncated)
    loop.run()
(gdb) py-list
 662            assert self is getcurrent(), 'Do not call Hub.run() directly'
 663            while True:
 664                loop = self.loop
 665                loop.error_handler = self
 666                try:
>667                    loop.run()
 668                finally:
 669                    loop.error_handler = None  # break the refcount cycle
 670                self.parent.throw(LoopExit('This operation would block forever', self))
 671            # this function must never return, as it will cause switch() in the parent greenlet
 672            # to return an unexpected value
0x00002ba98b6e8fd0 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) list
14               * Python requires non-stop mode.  Alas, some platforms enable FP
15               * exceptions by default.  Here we disable them.
16               */
17      #ifdef __FreeBSD__
18              fp_except_t m;
19
20              m = fpgetmask();
21              fpsetmask(m & ~FP_X_OFL);
22      #endif
23              return Py_Main(argc, argv);

Also our process is single threaded.

@planrich
Copy link
Contributor

Thanks, as I see things now the reason for this is the following:

PyGILState_GetThisThreadState looks up the thread local state which uses a mutex (keymutex in find_key, Python/thread.c). Assuming the following (as shown in the gdb stack trace in your first post):

PyGILState_Release also calls find_key and the signal occurs just after the keymutex has been acuqired. The signal in PyGILState_GetThisThreadState (which also tries to acquire keymutex) will hang, forever.

This could have happened most of the previous releases already. Thinking about a fix for that.

@planrich
Copy link
Contributor

so good news, I have worked around the issue described above and uploaded a pre release (version 0.4.2.dev1). Could you install that version an try again? pip install vmprof --pre should do

@jogo
Copy link
Contributor Author

jogo commented Feb 17, 2017

@planrich thanks, I tested out 0.4.2.dev1 and its now working!

@jogo
Copy link
Contributor Author

jogo commented Feb 23, 2017

Fix has been released in 0.4.2, closing

@jogo jogo closed this as completed Feb 23, 2017
@planrich
Copy link
Contributor

Released a fix, closing this issue.

@planrich planrich reopened this Feb 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants