This Terraform module will create an Amazon S3 bucket for use on the Cloud Platform.
module "s3" {
source = "github.com/ministryofjustice/cloud-platform-terraform-s3-bucket?ref=version" # use the latest release
# S3 configuration
versioning = true
# Tags
business_unit = var.business_unit
application = var.application
is_production = var.is_production
team_name = var.team_name
namespace = var.namespace
environment_name = var.environment
infrastructure_support = var.infrastructure_support
}
You can use a combination of the Cloud Platform IRSA module and Service pod module to access your source bucket using the AWS CLI.
In the cloud-platform-environments repository, within your namespace which contains your destination s3 bucket configuration, add the following terraform, substituting values as necessary:
module "cross_irsa" {
source = "github.com/ministryofjustice/cloud-platform-terraform-irsa?ref=[latest-release-here]"
business_unit = var.business_unit
application = var.application
eks_cluster_name = var.eks_cluster_name
namespace = var.namespace
service_account_name = "${var.namespace}-cross-service"
is_production = var.is_production
team_name = var.team_name
environment_name = var.environment
infrastructure_support = var.infrastructure_support
role_policy_arns = { s3 = aws_iam_policy.s3_migrate_policy.arn }
}
data "aws_iam_policy_document" "s3_migrate_policy" {
# List & location for source & destination S3 bucket.
statement {
actions = [
"s3:ListBucket",
"s3:GetBucketLocation"
]
resources = [
module.s3_bucket.bucket_arn,
"arn:aws:s3:::[source-bucket-name]"
]
}
# Permissions on source S3 bucket contents.
statement {
actions = [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:GetObjectTagging"
]
resources = [ "arn:aws:s3:::[source-bucket-name]/*" ] # take note of trailing /* here
}
# Permissions on destination S3 bucket contents.
statement {
actions = [
"s3:PutObject",
"s3:PutObjectTagging",
"s3:GetObject",
"s3:DeleteObject"
]
resources = [ "${module.s3_bucket.bucket_arn}/*" ]
}
}
resource "aws_iam_policy" "s3_migrate_policy" {
name = "s3_migrate_policy"
policy = data.aws_iam_policy_document.s3_migrate_policy.json
tags = {
business-unit = var.business_unit
application = var.application
is-production = var.is_production
environment-name = var.environment
owner = var.team_name
infrastructure-support = var.infrastructure_support
}
}
# store irsa rolearn in k8s secret for retrieving to provide within source bucket policy
resource "kubernetes_secret" "cross_irsa" {
metadata {
name = "cross-irsa-output"
namespace = var.namespace
}
data = {
role = module.cross_irsa.role_name
rolearn = module.cross_irsa.role_arn
serviceaccount = module.cross_irsa.service_account.name
}
}
# set up the service pod
module "cross_service_pod" {
source = "github.com/ministryofjustice/cloud-platform-terraform-service-pod?ref=[latest-release-here]"
namespace = var.namespace
service_account_name = module.cross_irsa.service_account.name
}
The source bucket must permit your IRSA role to "read" from its bucket explicitly.
First, retrieve the IRSA rolearn using cloud-platform CLI and jq
cloud-platform decode-secret -s cross-irsa-output | jq -r '.data.rolearn'
You should get output similar to below:
arn:aws:iam::754256621582:role/cloud-platform-irsa-randomstring1234
Example for the source bucket (using retrieved ARN from above):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowSourceBucketAccess",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject"
],
"Principal": {
"AWS": "arn:aws:iam::754256621582:role/cloud-platform-irsa-randomstring1234"
},
"Resource": [
"arn:aws:s3:::source-bucket",
"arn:aws:s3:::source-bucket/*"
]
}
]
}
Note the bucket being listed twice, this is needed not a typo - the first is for the bucket itself, second for objects within it.
Once configured, you can exec into your service pod and execute the following. This will add new, update existing and delete objects (not in source).
kubectl exec --stdin --tty cloud-platform-7e1f25a0c851c02c-service-pod-abc123 -- /bin/sh
aws s3 sync --delete \
s3://source_bucket_name \
s3://destination_bucket_name \
--source-region source_region \
--region destination_region
If you have some files stored in S3 that are compresses (e.g. .zip
, .gzip
, .bz2
, .p7z
) you don't need to fully download and re-upload them in order to decompress them you can quite easily decompress them on the cloud platform kubernetes cluster with a Job
.
The following example, is a Job
pod connected to a 50Gb persistent volume (so any temporary storage does not fill up a cluster node), using bunzip2
to decompress a .bz2
file and re-upload it to S3.
For your needs, simply substitute the namespace, AWS creds, the bucket/filename and the compression tool, then you should be able to use this to decompress a file of any size without having to download them locally to your machine.
---
apiVersion: batch/v1
kind: Job
metadata:
name: s3-decompression
namespace: default
spec:
backoffLimit: 0
template:
spec:
serviceAccountName: irsa-service-account-name
restartPolicy: Never
containers:
- name: tools
image: ministryofjustice/cloud-platform-tools:2.9.0
command:
- /bin/bash
- -c
- |
cd /unpack
aws s3 cp s3://${S3_BUCKET}/<filename>.bz2 - \
| bunzip2 \
| aws s3 cp - s3://${S3_BUCKET}/<filename>
env:
- name: S3_BUCKET
value: <s3-bucket-name>
resources: {}
volumeMounts:
- name: unpack
mountPath: "/unpack"
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
volumes:
- name: unpack
persistentVolumeClaim:
claimName: unpack-small
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: unpack-small
namespace: default
spec:
accessModes:
- ReadWriteOnce
storageClassName: "gp2-expand"
resources:
requests:
storage: 50Gi
For further guidance on using IRSA, for example accessing AWS buckets in different accounts, see the following links:
Use IAM Roles for service accounts to access resources in a different AWS account
Accessing AWS APIs and resources from your namespace
[Cloud Platform service pod for AWS CLI access]https://user-guide.cloud-platform.service.justice.gov.uk/documentation/other-topics/cloud-platform-service-pod.html)
See the examples/ folder for more information.
Name | Version |
---|---|
terraform | >= 1.2.5 |
aws | >= 4.0.0 |
random | >= 3.0.0 |
Name | Version |
---|---|
aws | >= 4.0.0 |
random | >= 3.0.0 |
No modules.
Name | Type |
---|---|
aws_iam_policy.irsa | resource |
aws_s3_bucket.bucket | resource |
aws_s3_bucket_public_access_block.block_public_access | resource |
random_id.id | resource |
aws_iam_policy_document.irsa | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
acl | The bucket ACL to set | string |
"private" |
no |
application | Application name | string |
n/a | yes |
bucket_name | Set the name of the S3 bucket. If left blank, a name will be automatically generated (recommended) | string |
"" |
no |
bucket_policy | The S3 bucket policy to set. If empty, no policy will be set | string |
"" |
no |
business_unit | Area of the MOJ responsible for the service | string |
n/a | yes |
cors_rule | cors rule | any |
[] |
no |
enable_allow_block_pub_access | Enable whether to allow for the bucket to be blocked from public access | bool |
true |
no |
environment_name | Environment name | string |
n/a | yes |
infrastructure_support | The team responsible for managing the infrastructure. Should be of the form () | string |
n/a | yes |
is_production | Whether this is used for production or not | string |
n/a | yes |
lifecycle_rule | lifecycle | any |
[] |
no |
log_path | Set the path of the logs | string |
"" |
no |
log_target_bucket | Set the target bucket for logs | string |
"" |
no |
logging_enabled | Set the logging for bucket | bool |
false |
no |
namespace | Namespace name | string |
n/a | yes |
team_name | Team name | string |
n/a | yes |
versioning | Enable object versioning for the bucket | bool |
false |
no |
Name | Description |
---|---|
bucket_arn | S3 bucket ARN |
bucket_domain_name | Regional bucket domain name |
bucket_name | S3 bucket name |
irsa_policy_arn | IAM policy ARN for access to the S3 bucket |
Some of the inputs for this module are tags. All infrastructure resources must be tagged to meet the MOJ Technical Guidance on Documenting owners of infrastructure.
You should use your namespace variables to populate these. See the Usage section for more information.