-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible deadlock when using an aggregator #2914
Comments
Nothing stands out to me from the stack trace, anything interesting in the logfile? |
Thank you for your help.
|
here is the whole stack tree. |
Wow, that is a lot of http_listener goroutines, I'll have to think about how we should deal with that... Can you share the code for this? |
Thank you for your reply. here is the sum code |
I've found the cause of this, the running_aggregator could be blocked during push which would in turn block the add function. This would prevent items from being added to the output and stall the entire process. |
Thank you for your reply. I think when you say the running_aggregator could be blocked during push, you actually want to say that the metricC is full. (The running_aggregator pushes the aggregator metric to metricC) Or the running_aggregator could be blocked during push because the periodT ticker failed. |
Yes, I believe the aggregator is blocked by metricC being full. I think if metricC fills and the internal metrics channel both fill, then you would be stuck. All the inputs use metricC as well so I think it could possibly happen under load. |
Will you fix these bugs in the future? Now I plan to use socket_listener instead of http_listener. |
Yes, I'm going to work on this in the next week, it will probably go out in the 1.4 release next month but you could easily backport it since you already have modifications. |
hi, danielnelson. The problem is that add gorountine and push gorountine may manipulate the aggragtor's map concurrently, causing fatal error: concurrent map iteration and map write. |
Yeah, that patch is fundamentally flawed. The real fix is going to require removing the loop in metric processing: where we push metrics from the aggregator back into the processors. Unfortunately, this is going to be too large of a change for 1.3.3, so I'm going to have to push it to 1.4. |
Hi, danielnelson. I create a temporary variable to store aggregator, then create a new gorountine to push. running_aggregator.go
sum.go
|
Seems like it should work to me. |
Directions
GitHub Issues are reserved for actionable bug reports and feature requests.
General questions should be asked at the InfluxData Community site.
Before opening an issue, search for similar bug reports or feature requests on GitHub Issues.
If no similar issue can be found, fill out either the "Bug Report" or the "Feature Request" section below.
Erase the other section and everything on and above this line.
Please note, the quickest way to fix a bug is to open a Pull Request.
Bug report
version: mater 、release 1.3
telegraf stop sending metrics after 3 days
Relevant telegraf.conf:
System info:
[Include Telegraf version, operating system name, and other relevant details]
Steps to reproduce:
Expected behavior:
Actual behavior:
Additional info:
send a SIGQUIT (^) to the process
[Include gist of relevant config, logs, etc.]
Feature Request
Opening a feature request kicks off a discussion.
Proposal:
Current behavior:
Desired behavior:
Use case: [Why is this important (helps with prioritizing requests)]
The text was updated successfully, but these errors were encountered: