-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock in ngx_http_lua_shdict_set_helper #1207
Comments
Maybe ngx.shared.DICT.set API was called in Garbage-Collection Metamethods and lua_pushlstring triggered the garbage collection. |
@gohai |
A quick summary for the referred topic above:
One of the solution is to use FFI-base |
Thank you @qlees @spacewander. Is switching to |
@gohai |
Thanks! We will try this in a few hours.
…On Mon 11. Dec 2017 at 16:22, 罗泽轩 ***@***.***> wrote:
@gohai <https://github.com/gohai>
Simply add require 'resty.core' in theinit_by_lua` context is enough.
See https://github.com/openresty/lua-resty-core
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1207 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEt2K-xeo6IzrJZAHGuBd1UcjWBs0ldQks5s_UjKgaJpZM4Q9Jo6>
.
|
We have been running this since about two hours now, and so far everything has been working fine with your suggested change. @qlees @spacewander For similar situations in the future, is there a suggested way to detect unresponsive |
Right, this is a known problem and we're gonna enforce the use of lua-resty-core library in OpenResty in the near future. @gohai The nginx master process (or the |
Thanks for your reply! @agentzh According to our logs and statistics, the instance was not able to process any requests after the first worker deadlocked. I believe the second worker immediately afterwards also stalled, waiting for the same We were using the TCP health check on an AWS load balancer. This one did not trigger in this situation, which is very unfortunate, and I would love to find out why, and how we could make it work the next time. Perhaps the health check just tests if the port is open ("Failure to connect within the configured timeout is considered unhealthy."). So if there was a way to close the port, or quit nginx when the workers are unresponsive... |
@gohai I think you should enable http healthcheck instead of TCP health check? We're not maintaining the AWS load balancer anyway. Or you should roll out your own healthchecker if it does not work for you. It's already out of the scope of this issue or this project anyway. |
Consider it resolved. |
…ocks. These are loaded by default in OpenResty 1.15.8.1+, but this will ensure this library is loaded in older versions. openresty/lua-nginx-module#1207 (comment) #43 #220
Hello,
We are using OpenResty 1.13.6.1 (1.el7.centos) to terminate HTTPS traffic for multiple domains, using the lua-resty-auto-ssl module.
One of our instances randomly stopped responding to traffic, and did not cover on its own. It turned out that both nginx worker processes appear to be wedged waiting for a futex.
Stacktraces on both processes look rather similar, although the details vary. Here is one:
ngx_http_lua_ssl_cert_handler -> ngx_http_lua_ssl_cert_by_chunk -> ngx_http_lua_run_thread -> ngx_http_lua_shdict_get_helper -> lua_pushlstring -> lj_gc_step -> gc_onestep -> gc_finalize -> gc_call_finalizer -> lj_BC_FUNCC -> ngx_http_lua_shdict_set_helper -> ngx_shmtx_lock -> sem_wait@@GLIBC_2.2.5
.I am wondering whether the apparent recursion between
ngx_http_lua_shdict_get_helper
andngx_http_lua_shdict_set_helper
might have caused an issue, as I see both functions acquiring the same lock? Or is there some subtle race condition between the two worker processes?Here is the stacktrace of the first worker process:
Here is the stacktrace of the second worker process:
Please let me know if there is anything we can help to fix this. Thank you!
The text was updated successfully, but these errors were encountered: