Set up infrastructure in Google Cloud Platform (GCP)
to be able to launch a Kubernetes cluster and various other resources using Hashicorp Terraform
.
Install Terraform using Homebrew (only on MacOS):
For any other distros, please follow the setup guide in the official docs.
- First, install the HashiCorp tap, a repository of all our Homebrew packages.
brew tap hashicorp/tap
- Now, install Terraform with
hashicorp/tap/terraform
.
# This installs a signed binary and is automatically updated with every new official release.
brew install hashicorp/tap/terraform
- To update to the latest version of Terraform, first update Homebrew.
brew update
- Then, run the upgrade command to download and use the latest Terraform version.
brew upgrade hashicorp/tap/terraform
Verify that the installation worked by opening a new terminal session and listing Terraform's available subcommands.
terraform -help
- Add any subcommand to terraform -help to learn more about what it does and available options.
terraform -help plan
NOTE: If you get an error that
terraform
could not be found, yourPATH
environment variable was not set up properly. Please go back and ensure that yourPATH
variable contains the directory where Terraform was installed.
- If you use either Bash or Zsh, you can enable tab completion for Terraform commands. To enable autocomplete, first ensure that a config file exists for your chosen shell.
# bash
touch ~/.bashrc
# zsh
touch ~/.zshrc
- Install the autocomplete package
terraform -install-autocomplete
Install GCP CLI using Homebrew (only on MacOS):
For any other distros, please follow the setup guide in the official docs.
brew install --cask google-cloud-sdk
Verify that the installation worked by opening a new terminal session and listing the gcloud
version.
gcloud version
- If you use either Bash or Zsh, you can enable tab completion for Terraform commands. To enable autocomplete, first ensure that a config file exists for your chosen shell.
- The commands below also add
gcloud
to yourPATH
environment variable.
NOTE: The commands below are specific to the MacOS.
# add `gcloud` to path
source '/opt/homebrew/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/path.zsh.inc'
# enable tab auto-complete
source '/opt/homebrew/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/completion.zsh.inc'
The prerequisite for GCP to work with organizations is that we need to have a registered <domain>.tld
which will be used as the organization
.
We can then create folders, sub-folders, projects and assign IAM roles to these projects, or make projects inherit roles and policies through inheritance from the organization.
In order to setup and get working with organizations for Google Cloud, we need to perform the below mentioned steps:
- Login to your GCP console using your
personal gmail account
(the one you have used to create a GCP account). - Go to Identity and Organization. Here you will be asked to perform a checklist in order to start using organizations.
Remember, we need to have a
<domain>.tld
in order to perform the coming steps.
- To use Google Cloud, you must use a Google identity service (either Cloud Identity or Workspace ) to administer credentials for users of your Google Cloud resources. Both Cloud Identity and Workspace provide authentication credentials allowing others to access your cloud resources.
- After establishing the Google identity service, you will use it to verify your domain. Verifying your domain automatically creates an organization resource , which is the root node of your Google Cloud resource hierarchy.
- Perform the steps to setup cloud identity and verify your domain.
In our case, we will be using the
gcp.<domain>.tld
as the domain to verify withgoogle cloud identity
, which requires us to add aTXT
record with our<domain>.tld
manager. We are managing the<domain>.tld
onAWS Route53
, so we will add thegcp.<domain>.tld
TXT
record in AWS, which will help Google Cloud verify the domain. NOTE: There are multiple additional steps that can be configured that Google recommends, to be setup when working with organizations. For our purposes and use-case, we will stick to only this setup (for now).
A service account represents a Google Cloud service identity, such as code running on Compute Engine VMs, App Engine apps or systems running outside Google. We will create and use a service account to setup our infra using Terraform.
NOTE: While granting policies to a service account, we will be doing that at the organization level, and pass down the policies to this service account via inheritance.
- In order to create a service account, we need to have a
project
, so we will create adev
folder withingcp.<domain>.tld
org with the project nametf-dev-001
(or any project name you like). - Go to
IAM and Admin
->Service Accounts
. - Create a service account following the steps mentioned in the GCP console.
- Once the service account is created, we need to create service account keys. Go to the service account you created and click on the
keys
tab. In there, click onADD KEY
, which will then download ajson
file with the service account keys. - To use the service account, save the service account keys JSON file to
~/.config/gcloud/<service-account-keys>.json
. - Configure the GCP CLI account to use these keys:
NOTE: These keys are required to be confidential, in case they are leaked or compromised, you need to delete them and create new keys from the console.
# Configure service account in GCP CLI : https://serverfault.com/a/901950
gcloud auth activate-service-account --key-file=<service-account-keys>.json
# check active gcloud configurations
gcloud config configurations list
# login to authenticate application-default GCP account
gcloud auth application-default login
Token Caching: If you have been running Terraform commands for a long time, you may want to clear any cached tokens on your machine, as they can become invalid over time. To avoid token caching, we need to run the application default login command:
gcloud auth application-default login
.
We will have to provide organization level roles that will be inherited by the service account and the root user.
All permission we provide will be given to the
organization principal
, i.e.,gcp.<domain>.tld
.
Here's a list of required roles:
gcp.<domain>.tld
:Billing Account Creator
Organization Administrator
Organization Policy Administrator
Project Creator
- root user:
Folder Admin
Organization Administrator
- service account:
Editor
Folder Admin
Project Creator
To ssh
into the VM instance, we will have to add the public SSH key into the project metadata
.
This can also be done via Terraform.
A key set in project metadata is propagated to every instance in the project. This resource configuration is prone to causing frequent diffs as Google adds SSH Keys when the SSH Button is pressed in the console. It is better to use OS Login instead.
- Using project metadata resource
/*
A key set in project metadata is propagated to every instance in the project.
This resource configuration is prone to causing frequent diffs as Google adds SSH Keys when the SSH Button is pressed in the console.
It is better to use OS Login instead.
*/
resource "google_compute_project_metadata" "my_ssh_key" {
metadata = {
ssh-keys = <<EOF
dev:ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILg6UtHDNyMNAh0GjaytsJdrUxjtLy3APXqZfNZhvCeT dev
foo:ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILg6UtHDNyMNAh0GjaytsJdrUxjtLy3APXqZfNZhvCeT bar
EOF
}
}
- Using instance metadata resource
resource "google_compute_instance" "vm" {
metadata = {
ssh-keys = "username:${file("~/.ssh/<ssh-key>.pub")}"
}
}
NOTE: We can also send a file in the
ssh-keys
metadata instead of plain-text public ssh key file.
- OS Login at User level (recommended method)
Follow the Terraform documentation on OS login on how to setup the google_os_login_ssh_public_key
resource. Once complete, we need to add the metadata enable-oslogin
property to TRUE
to enable user login using SSH.
To use OS login, we need to create the infrastructure on Terraform using the
application-default user
.
resource "google_compute_instance" "vm" {
metadata = {
# Enable os-login through metadata
enable-oslogin : "TRUE"
}
}
NOTE: Since we do not set a default username with OS login, we need to use
USERNAME_DOMAIN_SUFFIX
to ssh into the VM. Reference
To generate the ssh key locally on your workstation, use the following command:
NOTE: It is recommended that you create and store the SSH keys in
~/.ssh
directory
# follow the on-screen steps after running the command
# avoid adding a passphrase
ssh-keygen -t rsa -b 2048 -C <username>
Once the public SSH key has been added to the VM instance metadata, we can use the external IP
to connect to the VM instance.
Use the below command to connect to the instance:
ssh -i <path-to-private-key> <username>@<external-ip>
# if os login is enabled:
ssh -i <path-to-private-key> <USERNAME_DOMAIN_SUFFIX>@<external-ip>
# example: ssh -i ~/.ssh/gcp-compute.pub [email protected]
In order to create resources on GCP, we will have to enable some basic APIs. This can be done via Terraform.
Below is a non-exhaustive list of APIs that can come in handy:
compute.googleapis.com
storage.googleapis.com
container.googleapis.com
orgpolicy.googleapis.com
NOTE: Remember to add timed delays (using
time_sleep
resource) to the GCP resources when creating them via Terraform.
-
Initialize Terraform This installs the required providers and other plugins for our infrastructure.
# run in the `root` dir terraform init
-
Create a
<filename>.tfvars
using theexample.tfvars
template. -
Validate the terraform configuration
terraform validate
-
Plan the cloud infrastructure This command shows how many resources will be created, deleted or modified when we run
terraform apply
.NOTE: Remember to set your aws profile in the terminal to run the commands going forward
export AWS_PROFILE=root terraform plan -var-file="<filename>.tfvars"
-
Apply the changes/updates to the infrastructure to create it
# execute the tf plan # `--auto-approve` is to prevent tf from prompting you to say y/n to apply the plan terraform apply --auto-approve -var-file="<filename>.tfvars"
-
To destroy your infrastructure, use the command:
terraform destroy --auto-approve -var-file="<filename>.tfvars"
NOTE: This is the recommended best practice.
This is a storage location within GCP from where we access out .tfstate
file.
All the information about the infrastructure resources are defined in the
.tfstate
file when we runterraform apply
. So next time when we runterraform apply
, it will only compare the desired state to the actual state.
If we do not use a backend
to store our .tfstate
file, it is stored locally on a server (if we provision our infrastructure through a server) or on our local development workstation. The .tfstate
file may also contain confidential credentials. In order to avoid these problems, it is recommended to use the terraform backend
to store the .tfstate
file.
Now, when we run the terraform apply
command, the .tfstate
will be accessed through the storage bucket.
NOTE: The terraform
backend
does not allow the use of tfvars, so we hardcode these values in the configuration.
# https://developer.hashicorp.com/terraform/language/settings/backends/gcs
terraform {
backend "gcs" {
bucket = "tf-state-prod"
prefix = "terraform/state"
}
}
To run the kubernetes cluster on Google Kubernetes Engine, we use the google_container_cluster
resource to define the cluster configurations.
To provision a GKE cluster on Google Cloud, refer here. See the Using GKE with Terraform guide for more information about using GKE with Terraform.
Once the setup is configured, we need to connect to the bastion host in order to interact with the private GKE cluster. Follow the steps mentioned below in order to connect to the bastion host
via your local terminal:
- Install the google
gke-cloud-auth-plugin
locally:
# assuming you already have the gcloud-sdk installed:
gcloud components install gke-gcloud-auth-plugin
- Get the cluster configuration to be written into
~/.kube/config
:
gcloud container clusters get-credentials primary \
--region=<your-gke-region> \
--project=<your-gke-project-name>
- Connect to the tunnel via SSH:
ssh -i ~/.ssh/<private-key> <username>@<compute-instance-ip> -L 8888:127.0.0.1:8888 -N -q -f
- Configure
HTTPS_PROXY
to point tolocalhost:8888
:
export HTTPS_PROXY=localhost:8888
- Confirm connection to the GKE cluster:
kubectl get all
kubectl get ns
Following the latest changes to the standard GKE cluster, kubectl authentication has changes in GKE v1.26, starting which, users will have to install a new kubectl plugin called "gke-gcloud-auth-plugin".
Existing versions of kubectl and custom Kubernetes clients contain provider-specific code to manage authentication between the client and Google Kubernetes Engine. Starting with v1.26, this code will no longer be included as part of the OSS kubectl. GKE users will need to download and use a separate authentication plugin to generate GKE-specific tokens. This new binary, gke-gcloud-auth-plugin
, uses the Kubernetes Client-go Credential Plugin mechanism to extend kubectlโs authentication to support GKE. Because plugins are already supported by kubectl, you can switch to the new mechanism now, before v1.26 becomes available.
Below are the installation instructions and technical details of this new binary.
You will need to install the gke-gcloud-auth-plugin binary on all systems where kubectl or Kubernetes custom clients are used.
NOTE: Customers using
apt-get install
may need to set up Google Cloud-Sdk repository source, if not already set for other CLOUD-SDK component installations.
- Before installing the kubectl auth plugin, make sure that your operating system meets the following requirements:
# make sure the Ubuntu/Debian image has not reached it's end of life.
# recently updated packages
sudo apt-get update -y
# make sure`apt-transport-https` and `sudo` are installed
sudo apt-get install apt-transport-https ca-certificates gnupg curl sudo -y
- Install the Google Cloud-Sdk repository soource:
# For newer distributions (Debian 9+ or Ubuntu 18.04+) run the following command:
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo \
gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg
# For older distributions, run the following command:
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo \
apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
# If your distribution's apt-key command doesn't support the --keyring argument, run the following command:
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
- Add the gcloud CLI distribution URI as a package source
# For newer distributions (Debian 9+ or Ubuntu 18.04+), run the following command
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo \
tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
# For older distributions that don't support the signed-by option, run the following command:
echo "deb https://packages.cloud.google.com/apt cloud-sdk main" | sudo \
tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
NOTE: Make sure you don't have duplicate entries for the cloud-sdk repo in /etc/apt/sources.list.d/google-cloud-sdk.list.
- Finally, run the following command to install the plugin:
sudo apt-get update -y
sudo apt-get install google-cloud-sdk-gke-gcloud-auth-plugin -y