Skip to content

ministryofjustice/staff-device-dns-dhcp-disaster-recovery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

repo standards badge

Staff device DNS DHCP disaster recovery

This repo contains an interactive script which can be used to roll back a corrupt config file for the DNS or DHCP services.

Prerequisites

  • AWS Vault configured for the corrupted environment
  • jq to slice and filter and map and transform structured data

Recovering from a Disaster

In the event that Grafana has alerted on a disaster scenario, find the correct section and follow the steps provided.

Infrastructure deployment

To be able to follow this guide, you need to have the following already:

🎉 TIP
You may configure your AWS Vault to use AWS SSO. A step-by-step guide can be found in our team documentation site.

Prepare the variables

  1. Clone this repo to a local directory.

Initialize your Terraform

make init

If you hadn't already done this the first time it will clean the .terraform dir and create a new .env file with values retrieved from AWS SSM Parameter store.

Then run the above command again.

Switch to an isolated workspace

If you do not have a Terraform workspace created already, use the command below to create a new workspace.

Create Terraform workspace

aws-vault exec <shared-services-aws-vault-profile> -- terraform workspace new "YOUR_UNIQUE_WORKSPACE_NAME"

This should create a new workspace and select that new workspace at the same time.

If you already have a workspace created use the command below to select the right workspace before continue.

View Terraform workspace list

aws-vault exec <shared-services-aws-vault-profile> -- terraform workspace list

Select a Terraform workspace

aws-vault exec <shared-services-aws-vault-profile> -- terraform workspace select "YOUR_WORKSPACE_NAME"

Finally spin up your own Infra

make apply

Restore database

In the event that the RDS instance (staff-device-production-dhcp-db and/or staff-device-production-dhcp-admin-db) needs to be restored (e.g. due to data loss or instance failure), it can be restored using the daily automatic snapshots. This can be done using the AWS console or the AWS CLI.

To restore the database to a snapshot follow the steps below:

  1. Go the the RDS console in AWS
  2. Navigate to the 'System' tab of the Snapshot window.
  3. Select the latest snapshot to restore e.g. rds:staff-device-production-dhcp-db-2025-01-06-22-36, and click 'Restore snapshot' in the Actions dropdown.
  4. Enter the details required, such as DB identifier and security group id's.
  5. Press restore DB instance

Corrupt Config

  1. Identify the broken service (dns/dhcp) and environment (development/pre-production/production)
  2. Run:
    1. aws-vault exec CORRUPT_ENVIRONMENT_VAULT_PROFILE_NAME -- make restore-dns-dhcp-config
    2. At the prompt, enter the environment name (development/pre-production/production)
    3. At the second prompt, enter the corrupt service name (dns/dhcp)
    4. You will be given an output of the last five published configs with their VersionId and LastModified
    5. Copy the VersionId of the config you wish to restore to
    6. At the final prompt, paste the VersionId
    7. The terminal will exit with the following command: Successfully rolled back dhcp to version: VersionId

Corrupt Container

  1. Identify the broken service (dns/dhcp) and environment (development/pre-production/production)
  2. Run:
    1. aws-vault exec CORRUPT_ENVIRONMENT_VAULT_PROFILE_NAME -- make restore-service-container
    2. At the prompt, enter the environment name (development/pre-production/production)
    3. At the second prompt, enter the corrupt service name (dns/dhcp)
    4. You will be given an output of the last five pushed containers with their imageDigest and imagePushedAt
    5. Copy the imageDigest of the container you wish to re-tag as latest
    6. At the final prompt, paste the imageDigest
    7. The terminal will exit with the following command: Successfully re-tagged image: imageDigest as latest

Corrupt Admin Container

  1. Run:
    1. aws-vault exec CORRUPT_ENVIRONMENT_VAULT_PROFILE_NAME -- make restore-admin-container
    2. At the prompt, enter the environment name (development/pre-production/production)
    3. You will be given an output of the last five pushed containers with their imageDigest and imagePushedAt
    4. Copy the imageDigest of the container you wish to re-tag as latest
    5. At the final prompt, paste the imageDigest
    6. The terminal will exit with the following command: Successfully re-tagged image: imageDigest as latest

About

Disaster recovery script for DNS and DHCP services.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published