This repo contains an interactive script which can be used to roll back a corrupt config file for the DNS or DHCP services.
- AWS Vault configured for the corrupted environment
- jq to slice and filter and map and transform structured data
In the event that Grafana has alerted on a disaster scenario, find the correct section and follow the steps provided.
To be able to follow this guide, you need to have the following already:
- AWS Vault set up.
- Access to Moj AWS SSO.
🎉 TIP |
---|
You may configure your AWS Vault to use AWS SSO. A step-by-step guide can be found in our team documentation site. |
- Clone this repo to a local directory.
make init
If you hadn't already done this the first time it will clean the .terraform dir
and create a new .env
file with values retrieved from AWS SSM Parameter store.
Then run the above command again.
If you do not have a Terraform workspace created already, use the command below to create a new workspace.
aws-vault exec <shared-services-aws-vault-profile> -- terraform workspace new "YOUR_UNIQUE_WORKSPACE_NAME"
This should create a new workspace and select that new workspace at the same time.
If you already have a workspace created use the command below to select the right workspace before continue.
aws-vault exec <shared-services-aws-vault-profile> -- terraform workspace listaws-vault exec <shared-services-aws-vault-profile> -- terraform workspace select "YOUR_WORKSPACE_NAME"
make apply
In the event that the RDS instance (staff-device-production-dhcp-db and/or staff-device-production-dhcp-admin-db) needs to be restored (e.g. due to data loss or instance failure), it can be restored using the daily automatic snapshots. This can be done using the AWS console or the AWS CLI.
To restore the database to a snapshot follow the steps below:
- Go the the RDS console in AWS
- Navigate to the 'System' tab of the Snapshot window.
- Select the latest snapshot to restore e.g. rds:staff-device-production-dhcp-db-2025-01-06-22-36, and click 'Restore snapshot' in the Actions dropdown.
- Enter the details required, such as DB identifier and security group id's.
- Press restore DB instance
- Identify the broken service (dns/dhcp) and environment (development/pre-production/production)
- Run:
aws-vault exec CORRUPT_ENVIRONMENT_VAULT_PROFILE_NAME -- make restore-dns-dhcp-config
- At the prompt, enter the environment name (development/pre-production/production)
- At the second prompt, enter the corrupt service name (dns/dhcp)
- You will be given an output of the last five published configs with their
VersionId
andLastModified
- Copy the
VersionId
of the config you wish to restore to - At the final prompt, paste the
VersionId
- The terminal will exit with the following command:
Successfully rolled back dhcp to version: VersionId
- Identify the broken service (dns/dhcp) and environment (development/pre-production/production)
- Run:
aws-vault exec CORRUPT_ENVIRONMENT_VAULT_PROFILE_NAME -- make restore-service-container
- At the prompt, enter the environment name (development/pre-production/production)
- At the second prompt, enter the corrupt service name (dns/dhcp)
- You will be given an output of the last five pushed containers with their
imageDigest
andimagePushedAt
- Copy the
imageDigest
of the container you wish to re-tag as latest - At the final prompt, paste the
imageDigest
- The terminal will exit with the following command:
Successfully re-tagged image: imageDigest as latest
- Run:
aws-vault exec CORRUPT_ENVIRONMENT_VAULT_PROFILE_NAME -- make restore-admin-container
- At the prompt, enter the environment name (development/pre-production/production)
- You will be given an output of the last five pushed containers with their
imageDigest
andimagePushedAt
- Copy the
imageDigest
of the container you wish to re-tag as latest - At the final prompt, paste the
imageDigest
- The terminal will exit with the following command:
Successfully re-tagged image: imageDigest as latest