A lightweight system to automatically scale Kinesis Data Streams up and down based on throughput.
- Step 1: Metrics flow from the
Kinesis Data Stream(s)
intoCloudWatch Metrics
(Bytes/Sec, Records/Sec) - Step 2: Two alarms,
Scale Up
andScale Down
, evaluate those metrics and decide when to scale - Step 3: When a scaling alarm triggers it sends a message to the
Scaling SNS Topic
- Step 4: The
Scaling Lambda
processes that SNS message and…- Scales the
Kinesis Data Stream
up or down using UpdateShardCount- Scale Up events double the number of shards in the stream
- Scale Down events halve the number of shards in the stream
- Updates the metric math on the
Scale Up
andScale Down
alarms to reflect the new shard count.
- Scales the
- Designed for simplicity and a minimal service footprint.
- Proven. This system has been battle tested, scaling thousands of production streams without issue.
- Suitable for scaling massive amounts of streams. Each additional stream requires only 2 CloudWatch alarms.
- Operations friendly. Everything is viewable/editable/debuggable in the console, no need to drop into the CLI to see what's going on.
- Takes into account both ingress metrics
Records Per Second
andBytes Per Second
when deciding to scale a stream up or down. - Can optionally take into account egress needs via
Max Iterator Age
so streams that are N minutes behind (configurable) do not scale down and lose much needed Lambda processing power (Lambdas per Shard) because their shard count was reduced due to a drop in incoming traffic. - Already designed out the box to work within the 10 UpdateShardCount per rolling 24 hour limit.
- Emits a custom CloudWatch error metric if scaling fails, you can alarm off this for added peace of mind.
- Can optionally adjust reserved concurrency for your Lambda consumers as it scales their streams up and down.
Name | Description | Type | Default | Required |
---|---|---|---|---|
enable_slack_notification | Enable Slack Notification | bool |
false |
no |
encryption_type | Encryption Type | string |
KMS |
no |
kinesis_cooldown_mins | Cooling down Period in minutes | number |
10 |
no |
kinesis_scale_down_datapoints_required | Number of datapoints required in the evaluationPeriod to trigger the alarm to scale down | number |
285 |
no |
kinesis_scale_down_evaluation_period | Period after which the data for the alarm will be evaluated to scale down | number |
300 |
no |
kinesis_scale_down_min_iter_age_mins | To compare with streams max iterator age. If the streams max iterator age is above this, then the stream will not scale down | number |
30 |
no |
kinesis_scale_down_threshold | Scale down threshold | number |
0.25 |
no |
kinesis_scale_up_datapoints_required | Number of datapoints required in the evaluationPeriod to trigger the alarm to scale up | number |
25 |
no |
kinesis_scale_up_evaluation_period | Period after which the data for the alarm will be evaluated to scale up | number |
25 |
no |
kinesis_scale_up_threshold | Scale up threshold | number |
0.75 |
no |
kinesis_scaling_period_mins | Scaling Period in minute | number |
5 |
no |
kms_key_id | KMS Key | string |
n/a | yes |
min_shard_count | Minimum Number of Shards greater than zero | number |
5 |
yes |
shard_count | Number of Shards | number |
1 |
no |
slack_web_hook_url | Slack web hook URL | string |
n/a | yes |
stream_name | Stream Name | string |
n/a | yes |
stream_retention_period | Stream Retention Period | number |
24 |
no |
tags | Map of tags that should be applied to all resources | map(string) |
n/a | yes |
Name | Description |
---|---|
kinesis_stream_arn | Output variable definitions |
To generate traffic on your streams you can use Kinesis Data Generator.
Simply edit the scale.go
file as needed and run ./build
to generate a main file suitable for Lambda deployment. Go 1.15.x is recommended.
(https://github.com/aws-samples/kinesis-auto-scaling/tree/main/terraform)
ignore_changes for shard count has been removed as it has caused the inconsistency with cloud watch metric if the shard count is being updated outside terraform.