Terraform module for connecting a AKS cluster to CAST AI

Requirements

Terraform 0.13+

Using the module

A module to create Azure role and a service principal that can be used to connect to CAST AI

Requires castai/castai, hashicorp/azurerm, hashicorp/azuread, hashicorp/helm providers to be configured.

The required parameters can be provided manually or alternatively can be easily acquired from your AKS cluster resource or Azure RM subscription data source.

module "castai-aks-cluster" {
  source = "castai/aks/castai"

  aks_cluster_name    = var.aks_cluster_name
  aks_cluster_region  = var.aks_cluster_region
  node_resource_group = azurerm_kubernetes_cluster.example.node_resource_group
  resource_group      = azurerm_kubernetes_cluster.example.resource_group_name

  delete_nodes_on_disconnect = true

  subscription_id = data.azurerm_subscription.current.subscription_id
  tenant_id       = data.azurerm_subscription.current.tenant_id

  default_node_configuration = module.castai-aks-cluster.castai_node_configurations["default"]

  node_configurations = {
    default = {
      disk_cpu_ratio = 25
      subnets        = [azurerm_subnet.internal.id]
      tags           = {
        "node-config" : "default"
      }
    }
  }
  node_templates = {
    spot_tmpl = {
      configuration_id = module.castai-aks-cluster.castai_node_configurations["default"]

      should_taint = true

      custom_labels = {
        custom-label-key-1 = "custom-label-value-1"
        custom-label-key-2 = "custom-label-value-2"
      }

      custom_taints = [
        {
          key = "custom-taint-key-1"
          value = "custom-taint-value-1"
        },
        {
          key = "custom-taint-key-2"
          value = "custom-taint-value-2"
        }
      ]

      constraints = {
        fallback_restore_rate_seconds = 1800
        spot = true
        use_spot_fallbacks = true
        min_cpu = 4
        max_cpu = 100
        instance_families = {
          exclude = ["standard_DPLSv5"]
        }
        compute_optimized_state = "disabled"
        storage_optimized_state = "disabled"
      }
    }
  }

  autoscaler_settings = {
    enabled                                 = true
    node_templates_partial_matching_enabled = false

    unschedulable_pods = {
      enabled = true

      headroom = {
        enabled           = true
        cpu_percentage    = 10
        memory_percentage = 10
      }

      headroom_spot = {
        enabled           = true
        cpu_percentage    = 10
        memory_percentage = 10
      }
    }

    node_downscaler = {
      enabled = true

      empty_nodes = {
        enabled = true
      }

      evictor = {
        aggressive_mode           = false
        cycle_interval            = "5s10s"
        dry_run                   = false
        enabled                   = true
        node_grace_period_minutes = 10
        scoped_mode               = false
      }
    }

    cluster_limits = {
      enabled = true

      cpu = {
        max_cores = 20
        min_cores = 1
      }
    }
  }
}

Migrating from 2.x.x to 3.x.x

Version 3.x.x changes:

Removed custom_label attribute in castai_node_template resource. Use custom_labels instead.

Old configuration:

module "castai-aks-cluster" {
  node_templates = {
    spot_tmpl = {
      custom_label = {
        key = "custom-label-key-1"
        value = "custom-label-value-1"
      }
    }
  }
}

New configuration:

module "castai-aks-cluster" {
  node_templates = {
    spot_tmpl = {
      custom_labels = {
        custom-label-key-1 = "custom-label-value-1"
      }
    }
  }
}

Migrating from 3.x.x to 4.x.x

Version 4.x.x changed:

Removed compute_optimized and storage_optimized attributes in castai_node_template resource, constraints object. Use compute_optimized_state and storage_optimized_state instead.

Old configuration:

module "castai-aks-cluster" {
  node_templates = {
    spot_tmpl = {
      constraints = {
        compute_optimized = false
        storage_optimized = true
      }
    }
  }
}

New configuration:

module "castai-aks-cluster" {
  node_templates = {
    spot_tmpl = {
      constraints = {
        compute_optimized_state = "disabled"
        storage_optimized_state = "enabled"
      }
    }
  }
}

Migrating from 5.0.x to 5.2.x

Version 5.2.x changed:

Deprecated autoscaler_policies_json attribute. Use autoscaler_settings instead.

Old configuration:

module "castai-aks-cluster" {
  autoscaler_policies_json = <<-EOT
    {
        "enabled": true,
        "unschedulablePods": {
            "enabled": true
        },
        "nodeDownscaler": {
            "enabled": true,
            "emptyNodes": {
                "enabled": true
            },
            "evictor": {
                "aggressiveMode": false,
                "cycleInterval": "5m10s",
                "dryRun": false,
                "enabled": true,
                "nodeGracePeriodMinutes": 10,
                "scopedMode": false
            }
        },
        "nodeTemplatesPartialMatchingEnabled": false,
        "clusterLimits": {
            "cpu": {
                "maxCores": 20,
                "minCores": 1
            },
            "enabled": true
        }
    }
  EOT
}

New configuration:

module "castai-aks-cluster" {
  autoscaler_settings = {
    enabled                                 = true
    node_templates_partial_matching_enabled = false

    unschedulable_pods = {
      enabled = true
    }

    node_downscaler = {
      enabled = true

      empty_nodes = {
        enabled = true
      }

      evictor = {
        aggressive_mode           = false
        cycle_interval            = "5m10s"
        dry_run                   = false
        enabled                   = true
        node_grace_period_minutes = 10
        scoped_mode               = false
      }
    }

    cluster_limits = {
      enabled = true

      cpu = {
        max_cores = 20
        min_cores = 1
      }
    }
  }
}

Examples

Usage examples are located in terraform provider repo

Requirements

Name	Version
terraform	>= 0.13
azuread	>= 2.22.0
azurerm	>= 3.7.0
castai	~> 7.14
helm	>= 2.0.0

Providers

Name	Version
azuread	>= 2.22.0
azurerm	>= 3.7.0
castai	~> 7.14
helm	>= 2.0.0
null	n/a

Modules

No modules.

Resources

Name	Type
azuread_application.castai	resource
azuread_application_password.castai	resource
azuread_service_principal.castai	resource
azurerm_role_assignment.castai_additional_resource_groups	resource
azurerm_role_assignment.castai_node_resource_group	resource
azurerm_role_assignment.castai_resource_group	resource
azurerm_role_definition.castai	resource
castai_aks_cluster.castai_cluster	resource
castai_autoscaler.castai_autoscaler_policies	resource
castai_node_configuration.this	resource
castai_node_configuration_default.this	resource
castai_node_template.this	resource
castai_workload_scaling_policy.this	resource
helm_release.castai_agent	resource
helm_release.castai_cluster_controller	resource
helm_release.castai_cluster_controller_self_managed	resource
helm_release.castai_evictor	resource
helm_release.castai_evictor_ext	resource
helm_release.castai_evictor_self_managed	resource
helm_release.castai_kvisor	resource
helm_release.castai_kvisor_self_managed	resource
helm_release.castai_pod_pinner	resource
helm_release.castai_pod_pinner_self_managed	resource
helm_release.castai_spot_handler	resource
helm_release.castai_workload_autoscaler	resource
helm_release.castai_workload_autoscaler_self_managed	resource
null_resource.wait_for_cluster	resource
azuread_client_config.current	data source

Inputs

Name	Description	Type	Default	Required
additional_resource_groups	n/a	`list(string)`	`[]`	no
agent_values	List of YAML formatted string values for agent helm chart	`list(string)`	`[]`	no
agent_version	Version of castai-agent helm chart. If not provided, latest version will be used.	`string`	`null`	no
aks_cluster_name	Name of the cluster to be connected to CAST AI.	`string`	n/a	yes
aks_cluster_region	Region of the AKS cluster	`string`	n/a	yes
api_grpc_addr	CAST AI GRPC API address	`string`	`"api-grpc.cast.ai:443"`	no
api_url	URL of alternative CAST AI API to be used during development or testing	`string`	`"https://api.cast.ai"`	no
autoscaler_policies_json	Optional json object to override CAST AI cluster autoscaler policies. Deprecated, use `autoscaler_settings` instead.	`string`	`null`	no
autoscaler_settings	Optional Autoscaler policy definitions to override current autoscaler settings	`any`	`null`	no
castai_api_token	Optional CAST AI API token created in console.cast.ai API Access keys section. Used only when `wait_for_cluster_ready` is set to true	`string`	`""`	no
castai_components_labels	Optional additional Kubernetes labels for CAST AI pods	`map(any)`	`{}`	no
castai_components_sets	Optional additional 'set' configurations for helm resources.	`map(string)`	`{}`	no
cluster_controller_values	List of YAML formatted string values for cluster-controller helm chart	`list(string)`	`[]`	no
cluster_controller_version	Version of castai-cluster-controller helm chart. If not provided, latest version will be used.	`string`	`null`	no
default_node_configuration	ID of the default node configuration	`string`	`""`	no
default_node_configuration_name	Name of the default node configuration	`string`	`""`	no
delete_nodes_on_disconnect	Optionally delete Cast AI created nodes when the cluster is destroyed	`bool`	`false`	no
evictor_ext_values	List of YAML formatted string with evictor-ext values	`list(string)`	`[]`	no
evictor_ext_version	Version of castai-evictor-ext chart. Default latest	`string`	`null`	no
evictor_values	List of YAML formatted string values for evictor helm chart	`list(string)`	`[]`	no
evictor_version	Version of castai-evictor chart. If not provided, latest version will be used.	`string`	`null`	no
grpc_url	gRPC endpoint used by pod-pinner	`string`	`"grpc.cast.ai:443"`	no
install_security_agent	Optional flag for installation of security agent (https://docs.cast.ai/product-overview/console/security-insights/)	`bool`	`false`	no
install_workload_autoscaler	Optional flag for installation of workload autoscaler (https://docs.cast.ai/docs/workload-autoscaling-configuration)	`bool`	`false`	no
kvisor_controller_extra_args	Extra arguments for the kvisor controller. Optionally enable kvisor to lint Kubernetes YAML manifests, scan workload images and check if workloads pass CIS Kubernetes Benchmarks as well as NSA, WASP and PCI recommendations.	`map(string)`	{ "image-scan-enabled": "true", "kube-bench-enabled": "true", "kube-linter-enabled": "true" }	no
kvisor_values	List of YAML formatted string values for kvisor helm chart	`list(string)`	`[]`	no
kvisor_version	Version of kvisor chart. If not provided, latest version will be used.	`string`	`null`	no
node_configurations	Map of AKS node configurations to create	`any`	`{}`	no
node_resource_group	n/a	`string`	n/a	yes
node_templates	Map of node templates to create	`any`	`{}`	no
pod_pinner_values	List of YAML formatted string values for agent helm chart	`list(string)`	`[]`	no
pod_pinner_version	Version of pod-pinner helm chart. Default latest	`string`	`null`	no
resource_group	n/a	`string`	n/a	yes
self_managed	Whether CAST AI components' upgrades are managed by a customer; by default upgrades are managed CAST AI central system.	`bool`	`false`	no
spot_handler_values	List of YAML formatted string values for spot-handler helm chart	`list(string)`	`[]`	no
spot_handler_version	Version of castai-spot-handler helm chart. If not provided, latest version will be used.	`string`	`null`	no
subscription_id	Azure subscription ID	`string`	n/a	yes
tenant_id	n/a	`string`	n/a	yes
wait_for_cluster_ready	Wait for cluster to be ready before finishing the module execution, this option requires `castai_api_token` to be set	`bool`	`false`	no
workload_autoscaler_values	List of YAML formatted string with cluster-workload-autoscaler values	`list(string)`	`[]`	no
workload_autoscaler_version	Version of castai-workload-autoscaler helm chart. Default latest	`string`	`null`	no
workload_scaling_policies	Map of workload scaling policies to create	`any`	`{}`	no

Outputs

Name	Description
castai_node_configurations	Map of node configurations ids by name
castai_node_templates	Map of node template by name
cluster_id	CAST.AI cluster id, which can be used for accessing cluster data using API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Terraform module for connecting a AKS cluster to CAST AI

Requirements

Using the module

Migrating from 2.x.x to 3.x.x

Migrating from 3.x.x to 4.x.x

Migrating from 5.0.x to 5.2.x

Examples

Requirements

Providers

Modules

Resources

Inputs

Outputs

Files

README.md

Latest commit

History

README.md

File metadata and controls

Terraform module for connecting a AKS cluster to CAST AI

Requirements

Using the module

Migrating from 2.x.x to 3.x.x

Migrating from 3.x.x to 4.x.x

Migrating from 5.0.x to 5.2.x

Examples

Requirements

Providers

Modules

Resources

Inputs

Outputs