We collect the following yarn metrics
- Active Nodes
- Containers Allocated
- Containers Pending
- Cluster Memory Usage
- Unhealthy Nodes
- Apps Running
This is a simple python program that is expected to run on Cron on the JobTracker machine. We collect hadoop metrics like
- Total Map Slots
- Total Reduce Slots
- Total Nodes
- Number of Map slots Required
- Number of Reduce Slots Required
We push all these metrics to Cloudwatch periodcially. You can then create alarms which could trigger autoscaling of Hadoop Clusters. When visualized it as a Demand vs Supply it looks like this
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0