Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"UNK Unable to get ElastiCache details and statistics" when using "memory" metric #4

Open
mvolhontseff opened this issue Sep 8, 2015 · 3 comments

Comments

@mvolhontseff
Copy link

When executing the following, the the script returns "UNK Unable...":

check_elasticache.py --region us-east-1 -i cluster1 -m memory -w 10 -c 5

I traced this to an issue with the metrics dict (used by the function get_cluster_stats):

metrics = {'status': 'ElastiCache availability',
'cpu': 'CPUUtilization',
'memory': 'BytesUsedForCache', <==== Problem
'swap': 'SwapUsage'}

I wasn't able to find a metric entitled "BytesUsedForCache" within the list of available Cloudwatch metrics.

I did find BytesUsedForCacheItems and FreeableMemory, however.

@fr3nd
Copy link

fr3nd commented Sep 14, 2015

This metric only makes sense in Redis ElasticCache instances. If you're using memcached it's not available.

@mvolhontseff
Copy link
Author

OK, thanks for the clarification; this was for memcache.

Since this plugin is designed to be used for both Redis and memcached, does it make sense to use the Host-Level metric "FreeableMemory" instead?

http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/CacheMetrics.HostLevel.html

@DdPerna
Copy link

DdPerna commented Dec 29, 2016

I am having a similar issue using Redis ElasticCache instances.

The difference is that the check does work but periodically will show "UNK Unable to get ElastiCache details and statistics" which results in nagios alerting.
It seems to happen randomly about twice a week on a replica node. when this happens i run the command

check_elasticache.py -r us-east-1 -i example-redis-002 -m memory -w 80 -c 90

and it bounces between returning a result and showing unknown for a couple minutes then returns to normal. Im not sure if the issue is with the AWS api, or if the start and end time for getting the metric is the cause.
I noticed that the cpu metric takes into account a delay for the metric updating on CloudWatch and was wondering if adding this for the memory metric would fix the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants