Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research and PoC - Turn off instances in an environment overnight #1091

Closed
4 tasks
davidkelliott opened this issue Sep 14, 2021 · 10 comments
Closed
4 tasks
Assignees
Labels
enhancement New feature or request sustainability

Comments

@davidkelliott
Copy link
Contributor

User Story

There is no reason to have non production instances (EC2, RDS, etc) constantly running, outside of working hours they should be shut down.

Value

Save money and increase the sustainability of the platform. This also encourages work life balance if instances are not accessible at night.

Questions / Assumptions

There should be some way of excluding instances eg a tag.
Production instances should not be included.
This would need to be across all the mp accounts, could this be done centrally or would it need to be done on a per account basis?
Possible solutions include systems manager automation documents or a lambda.
We need to ensure instances can be started and stopped on a schedule without human intervention, application teams will also need to ensure this.

https://aws.amazon.com/blogs/mt/systems-manager-automation-documents-manage-instances-cut-costs-off-hours/

Definition of done

  • readme has been updated
  • user docs have been updated
  • another team member has reviewed
  • tests are green

Reference

How to write good user stories

@davidkelliott davidkelliott changed the title Turn off all non production instances overnight Research and PoC - Turn off all non production instances overnight Nov 2, 2021
@davidkelliott davidkelliott changed the title Research and PoC - Turn off all non production instances overnight Research and PoC - Turn off instances in a test environment overnight Nov 2, 2021
@davidkelliott davidkelliott added this to the Sprint 10 milestone Nov 2, 2021
@gfou-al gfou-al self-assigned this Nov 9, 2021
@gfou-al
Copy link
Contributor

gfou-al commented Nov 11, 2021

#1253

@gfou-al
Copy link
Contributor

gfou-al commented Nov 15, 2021

One solution that I got working on Sprinkler is by using the third-party terraform module terraform-aws-lambda-scheduler-stop-start as follows:

#------------------------------------------------------------------------------
# Schedule stop/start EC2 instances
#------------------------------------------------------------------------------

module "stop_ec2_instance_nights" {
  source                         = "github.com/diodonfrost/terraform-aws-lambda-scheduler-stop-start?ref=3.1.3"
  name                           = "stop_ec2_instance_nights"
  cloudwatch_schedule_expression = "cron(0 21 ? * * *)" # Everyday at 21:00 GMT
  schedule_action                = "stop"
  autoscaling_schedule           = "false"
  ec2_schedule                   = "true"
  rds_schedule                   = "false"
  cloudwatch_alarm_schedule      = "false"
  scheduler_tag                  = {
    key   = "stop_nights"
    value = "test" # only for the test environment
  }
}

module "start_ec2_instance_mornings" {
  source                         = "github.com/diodonfrost/terraform-aws-lambda-scheduler-stop-start?ref=3.1.3"
  name                           = "start_ec2_instance_mornings"
  cloudwatch_schedule_expression = "cron(0 6 ? * * *)" # Everyday at 6:00 GMT
  schedule_action                = "start"
  autoscaling_schedule           = "false"
  ec2_schedule                   = "true"
  rds_schedule                   = "false"
  cloudwatch_alarm_schedule      = "false"
  scheduler_tag                  = {
    key   = "stop_nights"
    value = "test" # only for the test environment
  }
}

resource "aws_kms_grant" "stop_start_scheduler" {
  key_id            = aws_kms_key.ebs.id
  grantee_principal = module.start_ec2_instance_mornings.lambda_iam_role_arn
  operations        = [
    "Decrypt",
    "DescribeKey",
    "CreateGrant"
  ]
}

@gfou-al
Copy link
Contributor

gfou-al commented Nov 15, 2021

#1260

@gfou-al
Copy link
Contributor

gfou-al commented Nov 16, 2021

We had a conversation with the team after today's scrum and decided that AWS Instance Scheduler is another alternative which might be the closest to what we will need in the end: https://aws.amazon.com/solutions/implementations/instance-scheduler/. While the terraform module would work, AWS Instance Scheduler allows cross-account and flexible scheduling, which might be beneficial for configuration at platform level. Only downside is that it's a new feature and there doesn't seem to be terraform module for it yet. We need a separate POC to see how this solution would work.

@davidkelliott davidkelliott removed this from the Sprint 10 milestone Nov 16, 2021
@dms1981
Copy link
Contributor

dms1981 commented Jan 7, 2022

Having used AWS Instance Scheduler previously, it's supplied as a cloudformation template by default. It could be migrated into terraform with some work, although there are (from memory) some interactions with dynamodb which are configured using an AWS-supplied CLI python package. Again (from memory) I believe those can be conducted either manually, or ported into Terraform as they're just DynamoDB entries ( relevant resources in the AWS provider are probably aws_dynamodb_table / aws_dynamodb_table_entry ).

@davidkelliott davidkelliott moved this from Done to Backlog in Modernisation Platform Jan 17, 2022
@davidkelliott davidkelliott changed the title Research and PoC - Turn off instances in a test environment overnight Research and PoC - Turn off instances in an environment overnight Feb 8, 2022
@dms1981 dms1981 self-assigned this Feb 14, 2022
@dms1981 dms1981 moved this from Backlog to In Progress in Modernisation Platform Feb 16, 2022
@dms1981
Copy link
Contributor

dms1981 commented Feb 16, 2022

Having looked into this the following things stand out:

  1. Instance Scheduler is supplied in two cloudformation templates. Would we recreate this in terraform, and if so where would we store the code?
  2. Which account would the main Instance Scheduler resources be deployed?
  3. The scheduler works by reading a tag - eg availability - and then matching the value to a schedule defined in dynamodb - eg weekdays. Is a single tag/key relationship enough for us?
  4. Do we expect this to be something teams opt into or opt out of?

@dms1981
Copy link
Contributor

dms1981 commented Feb 18, 2022

  1. For the proof of concept we're using the Amazon-supplied Cloud Formation Templates.
  2. We've discussed internally which accounts to make use of.
  3. We've agreed that the scheduler is sufficient as is.
  4. We've agreed that the scheduler will be something teams will opt out of.

@dms1981
Copy link
Contributor

dms1981 commented Feb 18, 2022

At present we have two CFTs deployed and can see CloudWatch logs indicating that we're successfully assuming a role in a remote account and querying instances appropriately. We also see an error message indicating that the schedule we've set up is not matching the schedule name being read despite the two appearing identical.
I have raised a support request in the calling AWS account to see what support AWS can offer us.

@dms1981
Copy link
Contributor

dms1981 commented Feb 22, 2022

Resolved issues we had with schedules not reading.
Confirmed that we can apply a schedule across accounts and shut down / start up instances based on schedules.
Might be some more value we can drive out of this card, but generically it looks like this will meet our needs and will demo / discuss next steps with @davidkelliott and the rest of the team.

@dms1981
Copy link
Contributor

dms1981 commented Feb 22, 2022

Successfully demonstrated. Will raise issues to move this along into production use

@dms1981 dms1981 closed this as completed Feb 22, 2022
Repository owner moved this from In Progress to Done in Modernisation Platform Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request sustainability
Projects
Archived in project
Development

No branches or pull requests

3 participants