-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Topbeat: can't reliably retrieve proc.cpu.total_p value from ES #1009
Comments
To me it sounds a bit like a mapping issue. Can you check if there's any difference in the mapping between the CPU and mem field? Simply query the index name and it will print you the mapping. You might also try to see if you can reproduce the same issue with Kibana. |
The mapping is identical, I am afraid. |
I'll try to reproduce it, but I'm currently traveling (along with the whole team), so I'm not sure when I'll get to it. |
I'm having the same issue. I'm running very similar setup as @nktl (the only diff is I'm on CentOS 7, which is actually not a big diff). |
Hey, any luck with this guys? It looks like the problem in fact can be replicated, as per @sickill comment. |
@nktl @sickill It might be a problem with calculating the cpu usage per process. Our implementation is similar with psutil library, so you can check if the results returned by this library are similar with what you expect. Are you comparing with the results reported by the top command? import psutil for x in range(30): psutil.cpu_percent(interval=10) Do you get approximate the same values of the cpu usage per process with topbeat? Is top command returning a different range of values? |
Thanks for chiming in, @monicasarbu. I don't think the problem is with the data collection procedure itself, as I can see correct data in Kibana for per process CPU metrics collected by topbeat. The issue is that whenever I try to retrieve those values from ES using the queries I posted in my first post, it looks like the values get casted to 'int' for some crazy reason - so any decimal prevision is lost (you essentially get either 0 or 1 back). |
nktl Thank you for clarifications. The problem might be that you have a mixture of data in Elasticsearch, some inserted before applying the template (as int) and some after you loaded you template (as float). In this case Elasticsearch tries sometimes to convert the percentages (float) to the default mapping that is "int". |
Thanks, it looks like this is exactly what was happening. I had topbeat instances running when clearing the index and applying the template, so it is very likely some of the data got inserted in int format, before template took an effect. The unexpected part is for ES to convert all values to int when doing direct queries, based just on a few initial values, but at the same time to display proper 'float' values in Kibana. It looks like stopping all the instances and doing full cleanup of the index + applying the template fixed this issue - my queries work as expected now. Many thanks for your assistance with this. |
Issue seems resolved, thanks @monicasarbu. |
OK, I spent almost two days dealing with this problem and going a bit crazy now. Possibly missing something silly and obvious.
Setup:
Problem:
Trying to graph CPU usage data per processes/host via Grafana. It looks like there is some float rounding issue for cpu-specific "_p" metrics. Using direct query to ES, the retrieved value is always either 0.0 or 1.0 where It should be a float value from this range (where 1.0 is 100%). The query I use was initially constructed by Grafana and it looks as follows (proc.pid is unique in this scenario):
This results with a bunch of metrics with '0' value (and occasionally 1.0, where CPU usage is >=100%), like:
The problem also exists for host-level CPU % metric like 'cpu.system_p'. It does NOT occur for RAM-related % metrics, like proc.mem.rss_p, for instance the following query works fine:
The result is:
The problem also does NOT exist for any non-% metrics, like 'proc.cpu.total' or 'cpu.system', the retrieved data is valid (although not terribly useful).
As mentioned, the data for all '_p' values visible in Kibana seems to be correct for all cases, for instance:
Additionally:
Troubleshooting:
I have absolutely no clue what is going on here, as it seems logical that querying 'proc.mem.rss_p' should produce the same kind of behavior as 'proc.cpu.total_p' - but it does not... It is possible I am missing something obvious (some problem with Grafana query?) and would be very grateful for any advice.
The text was updated successfully, but these errors were encountered: