Skip to content

Terraform module for connecting an AKS cluster to CAST AI

License

Notifications You must be signed in to change notification settings

castai/terraform-castai-aks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Terraform module for connecting a AKS cluster to CAST AI

Website: https://www.cast.ai

Requirements

Using the module

A module to create Azure role and a service principal that can be used to connect to CAST AI

Requires castai/castai, hashicorp/azurerm, hashicorp/azuread, hashicorp/helm providers to be configured.

The required parameters can be provided manually or alternatively can be easily acquired from your AKS cluster resource or Azure RM subscription data source.

module "castai-aks-cluster" {
  source = "castai/aks/castai"

  aks_cluster_name    = var.aks_cluster_name
  aks_cluster_region  = var.aks_cluster_region
  node_resource_group = azurerm_kubernetes_cluster.example.node_resource_group
  resource_group      = azurerm_kubernetes_cluster.example.resource_group_name

  delete_nodes_on_disconnect = true

  subscription_id = data.azurerm_subscription.current.subscription_id
  tenant_id       = data.azurerm_subscription.current.tenant_id

  default_node_configuration = module.castai-aks-cluster.castai_node_configurations["default"]

  node_configurations = {
    default = {
      disk_cpu_ratio = 25
      subnets        = [azurerm_subnet.internal.id]
      tags           = {
        "node-config" : "default"
      }
    }
  }
  node_templates = {
    spot_tmpl = {
      configuration_id = module.castai-aks-cluster.castai_node_configurations["default"]

      should_taint = true

      custom_labels = {
        custom-label-key-1 = "custom-label-value-1"
        custom-label-key-2 = "custom-label-value-2"
      }

      custom_taints = [
        {
          key = "custom-taint-key-1"
          value = "custom-taint-value-1"
        },
        {
          key = "custom-taint-key-2"
          value = "custom-taint-value-2"
        }
      ]

      constraints = {
        fallback_restore_rate_seconds = 1800
        spot = true
        use_spot_fallbacks = true
        min_cpu = 4
        max_cpu = 100
        instance_families = {
          exclude = ["standard_DPLSv5"]
        }
        compute_optimized_state = "disabled"
        storage_optimized_state = "disabled"
      }
    }
  }

  autoscaler_settings = {
    enabled                                 = true
    node_templates_partial_matching_enabled = false

    unschedulable_pods = {
      enabled = true

      headroom = {
        enabled           = true
        cpu_percentage    = 10
        memory_percentage = 10
      }

      headroom_spot = {
        enabled           = true
        cpu_percentage    = 10
        memory_percentage = 10
      }
    }

    node_downscaler = {
      enabled = true

      empty_nodes = {
        enabled = true
      }

      evictor = {
        aggressive_mode           = false
        cycle_interval            = "5s10s"
        dry_run                   = false
        enabled                   = true
        node_grace_period_minutes = 10
        scoped_mode               = false
      }
    }

    cluster_limits = {
      enabled = true

      cpu = {
        max_cores = 20
        min_cores = 1
      }
    }
  }
}

Migrating from 2.x.x to 3.x.x

Version 3.x.x changes:

  • Removed custom_label attribute in castai_node_template resource. Use custom_labels instead.

Old configuration:

module "castai-aks-cluster" {
  node_templates = {
    spot_tmpl = {
      custom_label = {
        key = "custom-label-key-1"
        value = "custom-label-value-1"
      }
    }
  }
}

New configuration:

module "castai-aks-cluster" {
  node_templates = {
    spot_tmpl = {
      custom_labels = {
        custom-label-key-1 = "custom-label-value-1"
      }
    }
  }
}

Migrating from 3.x.x to 4.x.x

Version 4.x.x changed:

  • Removed compute_optimized and storage_optimized attributes in castai_node_template resource, constraints object. Use compute_optimized_state and storage_optimized_state instead.

Old configuration:

module "castai-aks-cluster" {
  node_templates = {
    spot_tmpl = {
      constraints = {
        compute_optimized = false
        storage_optimized = true
      }
    }
  }
}

New configuration:

module "castai-aks-cluster" {
  node_templates = {
    spot_tmpl = {
      constraints = {
        compute_optimized_state = "disabled"
        storage_optimized_state = "enabled"
      }
    }
  }
}

Migrating from 5.0.x to 5.2.x

Version 5.2.x changed:

  • Deprecated autoscaler_policies_json attribute. Use autoscaler_settings instead.

Old configuration:

module "castai-aks-cluster" {
  autoscaler_policies_json = <<-EOT
    {
        "enabled": true,
        "unschedulablePods": {
            "enabled": true
        },
        "nodeDownscaler": {
            "enabled": true,
            "emptyNodes": {
                "enabled": true
            },
            "evictor": {
                "aggressiveMode": false,
                "cycleInterval": "5m10s",
                "dryRun": false,
                "enabled": true,
                "nodeGracePeriodMinutes": 10,
                "scopedMode": false
            }
        },
        "nodeTemplatesPartialMatchingEnabled": false,
        "clusterLimits": {
            "cpu": {
                "maxCores": 20,
                "minCores": 1
            },
            "enabled": true
        }
    }
  EOT
}

New configuration:

module "castai-aks-cluster" {
  autoscaler_settings = {
    enabled                                 = true
    node_templates_partial_matching_enabled = false

    unschedulable_pods = {
      enabled = true
    }

    node_downscaler = {
      enabled = true

      empty_nodes = {
        enabled = true
      }

      evictor = {
        aggressive_mode           = false
        cycle_interval            = "5m10s"
        dry_run                   = false
        enabled                   = true
        node_grace_period_minutes = 10
        scoped_mode               = false
      }
    }

    cluster_limits = {
      enabled = true

      cpu = {
        max_cores = 20
        min_cores = 1
      }
    }
  }
}

Examples

Usage examples are located in terraform provider repo

Requirements

Name Version
terraform >= 0.13
azuread >= 2.22.0
azurerm >= 3.7.0
castai ~> 7.14
helm >= 2.0.0

Providers

Name Version
azuread >= 2.22.0
azurerm >= 3.7.0
castai ~> 7.14
helm >= 2.0.0
null n/a

Modules

No modules.

Resources

Name Type
azuread_application.castai resource
azuread_application_password.castai resource
azuread_service_principal.castai resource
azurerm_role_assignment.castai_additional_resource_groups resource
azurerm_role_assignment.castai_node_resource_group resource
azurerm_role_assignment.castai_resource_group resource
azurerm_role_definition.castai resource
castai_aks_cluster.castai_cluster resource
castai_autoscaler.castai_autoscaler_policies resource
castai_node_configuration.this resource
castai_node_configuration_default.this resource
castai_node_template.this resource
castai_workload_scaling_policy.this resource
helm_release.castai_agent resource
helm_release.castai_cluster_controller resource
helm_release.castai_cluster_controller_self_managed resource
helm_release.castai_evictor resource
helm_release.castai_evictor_ext resource
helm_release.castai_evictor_self_managed resource
helm_release.castai_kvisor resource
helm_release.castai_kvisor_self_managed resource
helm_release.castai_pod_pinner resource
helm_release.castai_pod_pinner_self_managed resource
helm_release.castai_spot_handler resource
helm_release.castai_workload_autoscaler resource
helm_release.castai_workload_autoscaler_self_managed resource
null_resource.wait_for_cluster resource
azuread_client_config.current data source

Inputs

Name Description Type Default Required
additional_resource_groups n/a list(string) [] no
agent_values List of YAML formatted string values for agent helm chart list(string) [] no
agent_version Version of castai-agent helm chart. If not provided, latest version will be used. string null no
aks_cluster_name Name of the cluster to be connected to CAST AI. string n/a yes
aks_cluster_region Region of the AKS cluster string n/a yes
api_grpc_addr CAST AI GRPC API address string "api-grpc.cast.ai:443" no
api_url URL of alternative CAST AI API to be used during development or testing string "https://api.cast.ai" no
autoscaler_policies_json Optional json object to override CAST AI cluster autoscaler policies. Deprecated, use autoscaler_settings instead. string null no
autoscaler_settings Optional Autoscaler policy definitions to override current autoscaler settings any null no
castai_api_token Optional CAST AI API token created in console.cast.ai API Access keys section. Used only when wait_for_cluster_ready is set to true string "" no
castai_components_labels Optional additional Kubernetes labels for CAST AI pods map(any) {} no
castai_components_sets Optional additional 'set' configurations for helm resources. map(string) {} no
cluster_controller_values List of YAML formatted string values for cluster-controller helm chart list(string) [] no
cluster_controller_version Version of castai-cluster-controller helm chart. If not provided, latest version will be used. string null no
default_node_configuration ID of the default node configuration string "" no
default_node_configuration_name Name of the default node configuration string "" no
delete_nodes_on_disconnect Optionally delete Cast AI created nodes when the cluster is destroyed bool false no
evictor_ext_values List of YAML formatted string with evictor-ext values list(string) [] no
evictor_ext_version Version of castai-evictor-ext chart. Default latest string null no
evictor_values List of YAML formatted string values for evictor helm chart list(string) [] no
evictor_version Version of castai-evictor chart. If not provided, latest version will be used. string null no
grpc_url gRPC endpoint used by pod-pinner string "grpc.cast.ai:443" no
install_security_agent Optional flag for installation of security agent (https://docs.cast.ai/product-overview/console/security-insights/) bool false no
install_workload_autoscaler Optional flag for installation of workload autoscaler (https://docs.cast.ai/docs/workload-autoscaling-configuration) bool false no
kvisor_controller_extra_args Extra arguments for the kvisor controller. Optionally enable kvisor to lint Kubernetes YAML manifests, scan workload images and check if workloads pass CIS Kubernetes Benchmarks as well as NSA, WASP and PCI recommendations. map(string)
{
"image-scan-enabled": "true",
"kube-bench-enabled": "true",
"kube-linter-enabled": "true"
}
no
kvisor_values List of YAML formatted string values for kvisor helm chart list(string) [] no
kvisor_version Version of kvisor chart. If not provided, latest version will be used. string null no
node_configurations Map of AKS node configurations to create any {} no
node_resource_group n/a string n/a yes
node_templates Map of node templates to create any {} no
pod_pinner_values List of YAML formatted string values for agent helm chart list(string) [] no
pod_pinner_version Version of pod-pinner helm chart. Default latest string null no
resource_group n/a string n/a yes
self_managed Whether CAST AI components' upgrades are managed by a customer; by default upgrades are managed CAST AI central system. bool false no
spot_handler_values List of YAML formatted string values for spot-handler helm chart list(string) [] no
spot_handler_version Version of castai-spot-handler helm chart. If not provided, latest version will be used. string null no
subscription_id Azure subscription ID string n/a yes
tenant_id n/a string n/a yes
wait_for_cluster_ready Wait for cluster to be ready before finishing the module execution, this option requires castai_api_token to be set bool false no
workload_autoscaler_values List of YAML formatted string with cluster-workload-autoscaler values list(string) [] no
workload_autoscaler_version Version of castai-workload-autoscaler helm chart. Default latest string null no
workload_scaling_policies Map of workload scaling policies to create any {} no

Outputs

Name Description
castai_node_configurations Map of node configurations ids by name
castai_node_templates Map of node template by name
cluster_id CAST.AI cluster id, which can be used for accessing cluster data using API