Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to 4.34.0 breaks on node_pool_auto_config block with google-beta provider #12422

Assignees
Labels
bug forward/review In review; remove label to forward service/container

Comments

@lauraseidler
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v1.2.8
on linux_amd64
+ provider registry.terraform.io/hashicorp/external v2.2.2
+ provider registry.terraform.io/hashicorp/google v4.34.0
+ provider registry.terraform.io/hashicorp/google-beta v4.34.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.13.1
+ provider registry.terraform.io/hashicorp/null v3.1.1
+ provider registry.terraform.io/hashicorp/random v3.3.2
+ provider registry.terraform.io/hashicorp/time v0.8.0

Affected Resource(s)

  • google_container_cluster

Terraform Configuration Files

We're using a module within a module within a module to create the cluster (terraform-google-modules/kubernetes-engine/google//modules/safer-cluster-update-variant) - I can try to produce a minimal example with just the resource if necessary, but I'm unsure whether it's relevant, maybe this is already enough.

Debug Output

Panic Output

Expected Behavior

Plan shows that changes outside of Terraform were made:

  # module.project.module.gke.module.gke.google_container_cluster.primary has changed
  ~ resource "google_container_cluster" "primary" {
        id                          = "projects/***/locations/europe-west4/clusters/***"
        name                        = "***"
        # (28 unchanged attributes hidden)

      + node_pool_auto_config {
        }

        # (22 unchanged blocks hidden)
    }

With this change to be rolled back:

  # module.project.module.gke.module.gke.google_container_cluster.primary will be updated in-place
  ~ resource "google_container_cluster" "primary" {
        id                          = "projects/***/locations/europe-west4/clusters/***"
        name                        = "***"
        # (28 unchanged attributes hidden)

      - node_pool_auto_config {
        }

        # (22 unchanged blocks hidden)
    }

I would expect this to either succeed, or the change outside of Terraform to not happen in the first place.

Actual Behavior

Apply fails with error

Error: Provider produced inconsistent final plan

When expanding the plan for
module.project.module.gke.module.gke.google_container_cluster.primary to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/google-beta" produced an invalid new value
for .node_pool_auto_config: block count changed from 0 to 1.

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

Steps to Reproduce

  1. terraform plan
  2. terraform apply

Important Factoids

References

@edwardmedia edwardmedia self-assigned this Aug 30, 2022
@edwardmedia
Copy link
Contributor

@lauraseidler can you add a config that can be used to repro the issue?

@brokenjacobs
Copy link

happening to us to. Basically any cluster you provision that uses nodepool autoprovisioning or autopilot, when upgrading the provider, will have this optional dynamic block added in and trigger this. At least that's what it looks like to me.

@philip-harvey
Copy link

philip-harvey commented Aug 30, 2022

The provider needs to be fixed, but as a work around we added this to our GKE module:

variable "node_pool_network_tags" {
  description = "Network tags to use in autopilot or auto provisioned node pools"
  type = list(string)
  default = []
}

  dynamic "node_pool_auto_config" {
    for_each = var.cluster_autoscaling.enabled || var.enable_autopilot ? [""]: []
    content {
      dynamic "network_tags" {
        for_each = length(var.node_pool_network_tags) == 0 ? [] : [var.node_pool_network_tags]
        iterator = taglist
        content {
          tags = taglist
        }
      }
    }
  }

@brokenjacobs
Copy link

also as an aside... it would be great if the resource output an attribute with the current cluster network tag....

@edwardmedia
Copy link
Contributor

@brokenjacobs @lauraseidler @philip-harvey can you provide a config that can be used to repro the issue?

@philip-harvey
Copy link

philip-harvey commented Sep 2, 2022

@edwardmedia as far as I can tell any GKE cluster config that doesn't explicitly set node_pool_auto_config would break since the provider trys to delete node_pool_auto_config (set to null?) and this causes the provider to fail. If you set node_pool_auto_config to {}, as in my example above, then the provider doesn't fail.

In the example code I provided above we are setting node_pool_auto_config and the provider is OK with this.

Below is an example of the code we are using, which includes returning the network tags. This really needs to be returned by the provider as brokenjacobs mentioned.
If you remove the node_pool_auto_config section from this module then it will fail.

variable "addons" {
  description = "Addons enabled in the cluster (true means enabled)."
  type = object({
    cloudrun_config            = bool
    dns_cache_config           = bool
    horizontal_pod_autoscaling = bool
    http_load_balancing        = bool
    istio_config = object({
      enabled = bool
      tls     = bool
    })
    network_policy_config                 = bool
    gce_persistent_disk_csi_driver_config = bool
    config_connector_config               = bool
  })
  default = {
    config_connector_config    = false
    cloudrun_config            = false
    dns_cache_config           = false
    horizontal_pod_autoscaling = true
    http_load_balancing        = true
    istio_config = {
      enabled = false
      tls     = false
    }
    network_policy_config                 = false
    gce_persistent_disk_csi_driver_config = false
  }
}

variable "authenticator_security_group" {
  description = "RBAC security group for Google Groups for GKE, format is [email protected]."
  type        = string
  default     = null
}

variable "cluster_autoscaling" {
  description = "Enable and configure limits for Node Auto-Provisioning with Cluster Autoscaler."
  type = object({
    enabled              = bool
    cpu_min              = number
    cpu_max              = number
    memory_min           = number
    memory_max           = number
    autoscaling_profile  = string
    node_service_account = string
  })
  default = {
    enabled              = false
    cpu_min              = 0
    cpu_max              = 0
    memory_min           = 0
    memory_max           = 0
    autoscaling_profile  = "BALANCED" # Can also be OPTIMIZE_UTILIZATION
    node_service_account = null
  }
}

variable "database_encryption" {
  description = "Enable and configure GKE application-layer secrets encryption."
  type = object({
    enabled  = bool
    state    = string
    key_name = string
  })
  default = {
    enabled  = false
    state    = "DECRYPTED"
    key_name = null
  }
}

variable "default_max_pods_per_node" {
  description = "Maximum number of pods per node in this cluster."
  type        = number
  default     = 110
}

variable "description" {
  description = "Cluster description."
  type        = string
  default     = null
}

variable "dns_config" {
  description = "Configuration for Using Cloud DNS for GKE."
  type = object({
    cluster_dns        = string
    cluster_dns_scope  = string
    cluster_dns_domain = string
  })
  default = {
    cluster_dns        = "PROVIDER_UNSPECIFIED"
    cluster_dns_scope  = "DNS_SCOPE_UNSPECIFIED"
    cluster_dns_domain = ""
  }
}

variable "enable_autopilot" {
  description = "Create cluster in autopilot mode. With autopilot there's no need to create node-pools and some features are not supported (e.g. setting default_max_pods_per_node)"
  type        = bool
  default     = false
}

variable "binary_authorization_evaluation_mode" {
  description = "Mode of operation for Binary Authorization policy evaluation. Valid values are DISABLED and PROJECT_SINGLETON_POLICY_ENFORCE. PROJECT_SINGLETON_POLICY_ENFORCE is functionally equivalent to the deprecated enable_binary_authorization parameter being set to true."
  type = string
  default = "DISABLED"
}


variable "enable_dataplane_v2" {
  description = "Enable Dataplane V2 on the cluster, will disable network_policy addons config"
  type        = bool
  default     = false
}

variable "enable_intranode_visibility" {
  description = "Enable intra-node visibility to make same node pod to pod traffic visible."
  type        = bool
  default     = null
}

variable "enable_l4_ilb_subsetting" {
  description = "Enable L4ILB Subsetting."
  type        = bool
  default     = null
}

variable "enable_shielded_nodes" {
  description = "Enable Shielded Nodes features on all nodes in this cluster."
  type        = bool
  default     = null
}

variable "enable_tpu" {
  description = "Enable Cloud TPU resources in this cluster."
  type        = bool
  default     = null
}

variable "labels" {
  description = "Cluster resource labels."
  type        = map(string)
  default     = null
}

variable "location" {
  description = "Cluster zone or region."
  type        = string
}

variable "logging_config" {
  description = "Logging configuration (enabled components)."
  type        = list(string)
  default     = null
}

variable "logging_service" {
  description = "Logging service (disable with an empty string)."
  type        = string
  default     = "logging.googleapis.com/kubernetes"
}

variable "maintenance_config" {
  description = "Maintenance window configuration"
  type = object({
    daily_maintenance_window = object({
      start_time = string
    })
    recurring_window = object({
      start_time = string
      end_time   = string
      recurrence = string
    })
    maintenance_exclusion = list(object({
      exclusion_name = string
      start_time     = string
      end_time       = string
    }))
  })
  default = {
    daily_maintenance_window = {
      start_time = "03:00"
    }
    recurring_window      = null
    maintenance_exclusion = []
  }
}

variable "master_authorized_ranges" {
  description = "External Ip address ranges that can access the Kubernetes cluster master through HTTPS."
  type        = map(string)
  default     = {}
}

variable "min_master_version" {
  description = "Minimum version of the master, defaults to the version of the most recent official release."
  type        = string
  default     = null
}

variable "monitoring_config" {
  description = "Monitoring configuration (enabled components)."
  type        = list(string)
  default     = null
}

variable "managed_prometheus" {
  description = "Whether or not the managed collection is enabled."
  type        = bool
  default     = false
}

variable "monitoring_service" {
  description = "Monitoring service (disable with an empty string)."
  type        = string
  default     = "monitoring.googleapis.com/kubernetes"
}

variable "name" {
  description = "Cluster name."
  type        = string
}

variable "network" {
  description = "Name or self link of the VPC used for the cluster. Use the self link for Shared VPC."
  type        = string
}

variable "node_locations" {
  description = "Zones in which the cluster's nodes are located."
  type        = list(string)
  default     = []
}

variable "notification_config" {
  description = "GKE Cluster upgrade notifications via PubSub."
  type        = bool
  default     = false
}

variable "peering_config" {
  description = "Configure peering with the master VPC for private clusters."
  type = object({
    export_routes = bool
    import_routes = bool
    project_id    = string
  })
  default = null
}

variable "pod_security_policy" {
  description = "Enable the PodSecurityPolicy feature."
  type        = bool
  default     = null
}

variable "private_cluster_config" {
  description = "Enable and configure private cluster, private nodes must be true if used."
  type = object({
    enable_private_nodes    = bool
    enable_private_endpoint = bool
    master_ipv4_cidr_block  = string
    master_global_access    = bool
  })
  default = null
}

variable "project_id" {
  description = "Cluster project id."
  type        = string
}

variable "release_channel" {
  description = "Release channel for GKE upgrades."
  type        = string
  default     = null
}

variable "resource_usage_export_config" {
  description = "Configure the ResourceUsageExportConfig feature."
  type = object({
    enabled = bool
    dataset = string
  })
  default = {
    enabled = null
    dataset = null
  }
}

variable "secondary_range_pods" {
  description = "Subnet secondary range name used for pods."
  type        = string
}

variable "secondary_range_services" {
  description = "Subnet secondary range name used for services."
  type        = string
}

variable "subnetwork" {
  description = "VPC subnetwork name or self link."
  type        = string
}

variable "vertical_pod_autoscaling" {
  description = "Enable the Vertical Pod Autoscaling feature."
  type        = bool
  default     = null
}

variable "workload_identity" {
  description = "Enable the Workload Identity feature."
  type        = bool
  default     = true
}

variable "node_pool_network_tags" {
  description = "Network tags to use in autopilot or auto provisioned node pools"
  type = list(string)
  default = []
}

locals {
  # The Google provider is unable to validate certain configurations of
  # private_cluster_config when enable_private_nodes is false (provider docs)
  is_private = try(var.private_cluster_config.enable_private_nodes, false)
  peering = try(
    google_container_cluster.cluster.private_cluster_config.0.peering_name,
    null
  )
  peering_project_id = (
    try(var.peering_config.project_id, null) == null
    ? var.project_id
    : var.peering_config.project_id
  )
}

resource "google_container_cluster" "cluster" {
  provider                    = google-beta
  project                     = var.project_id
  name                        = var.name
  description                 = var.description
  location                    = var.location
  node_locations              = length(var.node_locations) == 0 ? null : var.node_locations
  min_master_version          = var.min_master_version
  network                     = var.network
  subnetwork                  = var.subnetwork
  logging_service             = var.logging_config == null ? var.logging_service : null
  monitoring_service          = var.monitoring_config == null ? var.monitoring_service : null
  resource_labels             = var.labels
  default_max_pods_per_node   = var.enable_autopilot ? null : var.default_max_pods_per_node
  enable_intranode_visibility = var.enable_intranode_visibility
  enable_l4_ilb_subsetting    = var.enable_l4_ilb_subsetting
  enable_shielded_nodes       = var.enable_shielded_nodes
  enable_tpu                  = var.enable_tpu
  initial_node_count          = 1
  remove_default_node_pool    = var.enable_autopilot ? null : true
  datapath_provider           = var.enable_dataplane_v2 ? "ADVANCED_DATAPATH" : "DATAPATH_PROVIDER_UNSPECIFIED"
  enable_autopilot            = var.enable_autopilot == true ? true : null

  # node_config {}
  # NOTE: Default node_pool is deleted, so node_config (here) is extranneous.
  # Specify that node_config as an parameter to gke-nodepool module instead.

  # TODO(ludomagno): compute addons map in locals and use a single dynamic block
  addons_config {
    dynamic "dns_cache_config" {
      for_each = var.enable_autopilot ? [] : [""]
      content {
        enabled = var.addons.dns_cache_config
      }
    }
    config_connector_config {
      enabled = var.addons.config_connector_config
    }
    http_load_balancing {
      disabled = !var.addons.http_load_balancing
    }
    horizontal_pod_autoscaling {
      disabled = !var.addons.horizontal_pod_autoscaling
    }
    dynamic "network_policy_config" {
      for_each = !var.enable_autopilot ? [""] : []
      content {
        disabled = !var.addons.network_policy_config
      }
    }
    cloudrun_config {
      disabled = !var.addons.cloudrun_config
    }
    istio_config {
      disabled = !var.addons.istio_config.enabled
      auth     = var.addons.istio_config.tls ? "AUTH_MUTUAL_TLS" : "AUTH_NONE"
    }
    gce_persistent_disk_csi_driver_config {
      enabled = var.addons.gce_persistent_disk_csi_driver_config
    }
  }

  # TODO(ludomagno): support setting address ranges instead of range names
  # https://www.terraform.io/docs/providers/google/r/container_cluster.html#cluster_ipv4_cidr_block
  ip_allocation_policy {
    cluster_secondary_range_name  = var.secondary_range_pods
    services_secondary_range_name = var.secondary_range_services
  }

  # https://www.terraform.io/docs/providers/google/r/container_cluster.html#daily_maintenance_window
  maintenance_policy {
    dynamic "daily_maintenance_window" {
      for_each = var.maintenance_config != null && lookup(var.maintenance_config, "daily_maintenance_window", null) != null ? [var.maintenance_config.daily_maintenance_window] : []
      iterator = config
      content {
        start_time = config.value.start_time
      }
    }

    dynamic "recurring_window" {
      for_each = var.maintenance_config != null && lookup(var.maintenance_config, "recurring_window", null) != null ? [var.maintenance_config.recurring_window] : []
      iterator = config
      content {
        start_time = config.value.start_time
        end_time   = config.value.end_time
        recurrence = config.value.recurrence
      }
    }

    dynamic "maintenance_exclusion" {
      for_each = var.maintenance_config != null && lookup(var.maintenance_config, "maintenance_exclusion", null) != null ? var.maintenance_config.maintenance_exclusion : []
      iterator = config
      content {
        exclusion_name = config.value.exclusion_name
        start_time     = config.value.start_time
        end_time       = config.value.end_time
      }
    }
  }

  master_auth {
    client_certificate_config {
      issue_client_certificate = false
    }
  }

  dynamic "master_authorized_networks_config" {
    for_each = (
      length(var.master_authorized_ranges) == 0
      ? []
      : [var.master_authorized_ranges]
    )
    iterator = ranges
    content {
      dynamic "cidr_blocks" {
        for_each = ranges.value
        iterator = range
        content {
          cidr_block   = range.value
          display_name = range.key
        }
      }
    }
  }

  #the network_policy block is enabled if network_policy_config and network_dataplane_v2 is set to false. Dataplane V2 has built-in network policies.
  dynamic "network_policy" {
    for_each = var.addons.network_policy_config ? [""] : []
    content {
      enabled  = var.enable_dataplane_v2 ? false : true
      provider = var.enable_dataplane_v2 ? "PROVIDER_UNSPECIFIED" : "CALICO"
    }
  }

  dynamic "private_cluster_config" {
    for_each = local.is_private ? [var.private_cluster_config] : []
    iterator = config
    content {
      enable_private_nodes    = config.value.enable_private_nodes
      enable_private_endpoint = config.value.enable_private_endpoint
      master_ipv4_cidr_block  = config.value.master_ipv4_cidr_block
      master_global_access_config {
        enabled = config.value.master_global_access
      }
    }
  }

  # beta features

  dynamic "authenticator_groups_config" {
    for_each = var.authenticator_security_group == null ? [] : [""]
    content {
      security_group = var.authenticator_security_group
    }
  }

  dynamic "cluster_autoscaling" {
    for_each = var.enable_autopilot ? [] : [var.cluster_autoscaling] # Autopilot and cluster autoscaling are incompatible
    iterator = config
    content {
      enabled = config.value.enabled
      dynamic "resource_limits" {
        for_each = var.cluster_autoscaling.enabled ? [1] : []
        content {
          resource_type = "cpu"
          minimum       = config.value.cpu_min
          maximum       = config.value.cpu_max
        }
      }
      dynamic "resource_limits" {
        for_each = var.cluster_autoscaling.enabled ? [1] : []
        content {
          resource_type = "memory"
          minimum       = config.value.memory_min
          maximum       = config.value.memory_max
        }
      }
      autoscaling_profile = config.value.autoscaling_profile
      // TODO: support GPUs too
      dynamic "auto_provisioning_defaults" {
        for_each = var.cluster_autoscaling.enabled ? [1] : []
        content {
          service_account = config.value.node_service_account
        }
      }
    }
  }

  dynamic "database_encryption" {
    for_each = var.database_encryption.enabled ? [var.database_encryption] : []
    iterator = config
    content {
      state    = config.value.state
      key_name = config.value.key_name
    }
  }

  dynamic "pod_security_policy_config" {
    for_each = var.pod_security_policy != null ? [""] : []
    content {
      enabled = var.pod_security_policy
    }
  }

  dynamic "release_channel" {
    for_each = var.release_channel != null ? [""] : []
    content {
      channel = var.release_channel
    }
  }

  dynamic "resource_usage_export_config" {
    for_each = (
      var.resource_usage_export_config.enabled != null
      &&
      var.resource_usage_export_config.dataset != null
      ? [""] : []
    )
    content {
      enable_network_egress_metering = var.resource_usage_export_config.enabled
      bigquery_destination {
        dataset_id = var.resource_usage_export_config.dataset
      }
    }
  }

  dynamic "vertical_pod_autoscaling" {
    for_each = var.vertical_pod_autoscaling == null ? [] : [""]
    content {
      enabled = var.vertical_pod_autoscaling
    }
  }

  dynamic "workload_identity_config" {
    for_each = var.workload_identity && !var.enable_autopilot ? [""] : []
    content {
      workload_pool = "${var.project_id}.svc.id.goog"
    }
  }

  dynamic "monitoring_config" {
    for_each = var.monitoring_config != null ? [""] : []
    content {
      enable_components = var.monitoring_config
      dynamic "managed_prometheus" {
        for_each = var.managed_prometheus ? [var.managed_prometheus] : []
        content {
          enabled = var.managed_prometheus
        }
      }
    }
  }

  dynamic "logging_config" {
    for_each = var.logging_config != null ? [""] : []
    content {
      enable_components = var.logging_config
    }
  }

  dynamic "dns_config" {
    for_each = var.dns_config != null ? [var.dns_config] : []
    iterator = config
    content {
      cluster_dns        = config.value.cluster_dns
      cluster_dns_scope  = config.value.cluster_dns_scope
      cluster_dns_domain = config.value.cluster_dns_domain
    }
  }

  dynamic "notification_config" {
    for_each = var.notification_config ? [""] : []
    content {
      pubsub {
        enabled = var.notification_config
        topic   = var.notification_config ? google_pubsub_topic.notifications[0].id : null
      }
    }
  }

  dynamic "node_pool_auto_config" {
    for_each = var.cluster_autoscaling.enabled || var.enable_autopilot ? [""]: []
    content {
      dynamic "network_tags" {
        for_each = length(var.node_pool_network_tags) == 0 ? [] : [var.node_pool_network_tags]
        iterator = taglist
        content {
          tags = taglist
        }
      }
    }
  }

  # Defaults to DISABLED
  binary_authorization {
    evaluation_mode = var.binary_authorization_evaluation_mode
  }

  lifecycle {
    ignore_changes = [
      dns_config
    ]
  }
}

resource "google_compute_network_peering_routes_config" "gke_master" {
  count                = local.is_private && var.peering_config != null ? 1 : 0
  project              = local.peering_project_id
  peering              = local.peering
  network              = element(reverse(split("/", var.network)), 0)
  import_custom_routes = var.peering_config.import_routes
  export_custom_routes = var.peering_config.export_routes
}

resource "google_pubsub_topic" "notifications" {
  count   = var.notification_config ? 1 : 0
  name    = "gke-pubsub-notifications"
  project = var.project_id
  labels = {
    content = "gke-notifications"
  }
}

data "google_client_config" "default" {}

data "http" "gke_cluster" {
  url = "https://container.googleapis.com/v1/projects/${var.project_id}/locations/${var.location}/clusters/${var.name}"
  request_headers = {
    Accept = "application/json; charset=utf-8"
    Authorization = "Bearer ${data.google_client_config.default.access_token}"
  }
  depends_on = [google_container_cluster.cluster]
}

output "ca_certificate" {
  description = "Public certificate of the cluster (base64-encoded)."
  value       = google_container_cluster.cluster.master_auth.0.cluster_ca_certificate
  sensitive   = true
}

output "cluster" {
  description = "Cluster resource."
  sensitive   = true
  value       = google_container_cluster.cluster
}

output "endpoint" {
  description = "Cluster endpoint."
  value       = google_container_cluster.cluster.endpoint
}

output "location" {
  description = "Cluster location."
  value       = google_container_cluster.cluster.location
}

output "master_version" {
  description = "Master version."
  value       = google_container_cluster.cluster.master_version
}

output "name" {
  description = "Cluster name."
  value       = google_container_cluster.cluster.name
}

output "notifications" {
  description = "GKE PubSub notifications topic."
  value       = var.notification_config ? google_pubsub_topic.notifications[0].id : null
}

output "node_network_tag" {
  value = "gke-${var.name}-${substr(jsondecode(data.http.gke_cluster.response_body).id,0,8)}-node"
}

output "http_cluster" {
  value = data.http.gke_cluster
}

scheuk added a commit to TakeoffTech/netbox-gcp-deployment that referenced this issue Sep 6, 2022
@edwardmedia
Copy link
Contributor

@shuyama1 can you take a look at this?

@jdpribula
Copy link

This is becoming very impactful especially because it can cause node pools to be deleted and not created when making immutable updates to pools.

@shuyama1
Copy link
Collaborator

Thanks for the feedback! I was able to reproduce this error locally and I'm working on a fix now.

@billyfoss
Copy link

Something seems different about the process of detecting changes in the node pool. I've been working on workarounds to enable dynamically updating node pool labels, tags, and taints (recently supported by the Google API if the cluster is >= 1.23 or if the cluster is 1.22 and the node pool autoscaling is turned off).

Yesterday I ran into a problem where updating the node pool configuration (labels, taints, or tags) via gcloud cli triggered the nodes in that node pool to be recreated. So I've been trying to recreate that issue today. In the process, I have seen a perm-diff where the node pool has the linuxNodeConfig: {} set and Terraform wants to remove it, but can't. I believe that was with 4.34. When running today with 4.36, I was noticing that only changes in the tags will trigger a forces replacement on the resource. If I manually change the tags back to match, then Terraform does not detect the differences in the labels and taints and reports No changes. Your infrastructure matches the configuration.

Here is an example output from the plan where the label and taint differences are seen, but they won't trigger Terraform to change them.

      ~ node_config {
          ~ disk_size_gb      = 100 -> (known after apply)
          ~ disk_type         = "pd-standard" -> (known after apply)
          ~ guest_accelerator = [] -> (known after apply)
          ~ image_type        = "COS_CONTAINERD" -> (known after apply)
          ~ labels            = {
              - "my-node-pool" = "true"
            } -> (known after apply)
          ~ local_ssd_count   = 0 -> (known after apply)
          ~ metadata          = {
              - "disable-legacy-endpoints" = "true"
            } -> (known after apply)
          + min_cpu_platform  = (known after apply)
          - tags              = [
              - "private",
            ] -> null # forces replacement
          ~ taint             = [
              - {
                  - effect = "NO_SCHEDULE"
                  - key    = "draining_again"
                  - value  = "true"
                },
            ] -> (known after apply)
            # (5 unchanged attributes hidden)

          ~ shielded_instance_config {
              ~ enable_integrity_monitoring = true -> (known after apply)
              ~ enable_secure_boot          = false -> (known after apply)
            }

          + workload_metadata_config {
              + mode = (known after apply)
            }
        }

Is having the extra layer of node_pool_defaults preventing the manual changes from being detected? Or is the provider updated to support dynamically updating the labels and taints?

@lauraseidler
Copy link
Author

With the update to 4.37.0, the same problem now happens additionally with node_pool_defaults.

@shuyama1
Copy link
Collaborator

The PR has not been merged and released yet. You could try to pin the provider version to 4.33.0 to prevent issues in the short term until the fix is released. The PR is targeting the next release.

@thameezb
Copy link

thameezb commented Sep 30, 2022

4.38.0 has been released, and the issue still persists on both node_pool_defaults and node_pool_auto_config. This has not been resolved

Apply output:


│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.cluster.google_container_cluster.cluster to include new values learned so far during apply, provider
│ "registry.terraform.io/hashicorp/google-beta" produced an invalid new value for .node_pool_defaults: block count changed from 0 to 1.
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for module.cluster.google_container_cluster.cluster to include new values learned so far during apply, provider
│ "registry.terraform.io/hashicorp/google-beta" produced an invalid new value for .node_pool_auto_config: block count changed from 0 to 1.
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

@dani2819
Copy link

Agree with @thameezb. I have the same issue on the latest release i-e 4.38.0 on both node_pool_defaults and node_pool_auto_config. The block count changed without having it specified.

@Malik-Muneeb
Copy link

Agree with @thameezb. I'm facing the same issue with v4.38.0 on both node_pool_defaults and node_pool_auto_config

@shuyama1
Copy link
Collaborator

shuyama1 commented Oct 6, 2022

The fix is released in v4.39.0. Can you try upgrading to v4.39.0 and see if the issue still occurs?

@Malik-Muneeb
Copy link

The issue is resolved now with v4.39.0. Thanks much!

@shuyama1
Copy link
Collaborator

shuyama1 commented Oct 6, 2022

Glad it's solved! Thanks for the update!

@github-actions
Copy link

github-actions bot commented Nov 6, 2022

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 6, 2022
@github-actions github-actions bot added service/container forward/review In review; remove label to forward labels Jan 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.