Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create a GKE cluster #898

Closed
koalalorenzo opened this issue Dec 24, 2017 · 10 comments · Fixed by #924
Closed

Unable to create a GKE cluster #898

koalalorenzo opened this issue Dec 24, 2017 · 10 comments · Fixed by #924
Assignees

Comments

@koalalorenzo
Copy link
Contributor

Reporting the same as in:
hashicorp/terraform#16981

Probably this is more adequate place to put this.

Terraform Version

Terraform v0.11.1
+ provider.google v1.2.0 (but also v1.4.0)
+ provider.random v1.0.0

Terraform Configuration Files

resource "google_container_cluster" "the_cluster" {
  name               = "the-cluster-${random_string.the_cluster_id.result}"
  zone               = "europe-west1-c"
  min_master_version = "1.8.4-gke.1"
  initial_node_count = 1

  logging_service    = "none"
  monitoring_service = "none"

  maintenance_policy {
    daily_maintenance_window {
      start_time = "03:00"
    }
  }

  addons_config {
    http_load_balancing {
      disabled = true
    }
  }

  node_config {
    disk_size_gb = "32"
    image_type   = "COS"
    machine_type = "n1-standard-2"
    preemptible  = false

    labels {
      pool    = "default-pool"
      cluster = "the-cluster"
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

Note: The error happens also when creating a google_container_node_pool

Debug Output

Crash Output

Expected Behavior

The cluster is created within 15-30 minutes

Actual Behavior

Error from Google API: All cluster resources were brought up, but the cluster API is reporting that only 0 nodes out of 1 have registered. Cluster may be unhealthy.

Steps to Reproduce

  1. terraform init
  2. terraform apply

Important Factoids

Running on the latest version of macOS (High Sierra)

References

@koalalorenzo
Copy link
Contributor Author

After testing with different setup, the only way to made it working was to enable monitoring and logging. The VM that was created by the cluster was not reachable via ssh, even from the cloud shell.

I had to logging and monitoring:

#  logging_service    = "none"
#  monitoring_service = "none"

I think this is some sort of bug because CLI and the portal are capable of deploying with no problem.

@rosbo rosbo changed the title Unable to createa a GKE cluster - Google Cloud Unable to create a GKE cluster Jan 4, 2018
@danawillow danawillow self-assigned this Jan 4, 2018
@danawillow
Copy link
Contributor

Hey @koalalorenzo, I've been playing around with this and I have a solid idea of what's going on.

When you create a cluster in the UI or with gcloud, it comes with a default set of permissions that are enabled for the service account running on the instances that get created. You can see this in the console under "Access Scopes". If you try to edit these in the console, you can see that it doesn't allow disabling the scopes for logging/monitoring. However, when you use the REST API (which Terraform does), if you don't specify scopes it sends them all as disabled. GKE needs the monitoring scope to be there in order for the nodes to register.

A quick fix you can use for this would be to just enable the monitoring scope in your node_config block:

    oauth_scopes = ["monitoring"]

I'll also look into setting some default scopes in Terraform, and follow up with the GKE team on why they allow you to disable the monitoring scope in the REST API but not in the console.

@koalalorenzo
Copy link
Contributor Author

Ok, so there should be a PR enabling by default monitoring and logging APIs. Is that going to fix it?

@danawillow
Copy link
Contributor

Yeah, that would fix it. I'm just confirming with people in GKE whether we want to add any others to be enabled by default too.

@ikehz
Copy link

ikehz commented Jan 5, 2018

Just adding my 2¢: this is primarily an issue because you've turned off cloud logging & monitoring (which are on by default):

  logging_service    = "none"
  monitoring_service = "none"

Starting in v1.9, this bug is fixed: we'll always add the monitoring scope to nodes (which is necessary for other reasons as well) regardless of if you've enabled cloud monitoring.

@koalalorenzo
Copy link
Contributor Author

I am almost sure that google_container_node_pool is affected as well. @danawillow is the PR also fixing for the node_pool? I am creating nodes and it goes in timeout. But if I add the oauth scope I have no problem.

I am adding to both google_container_node_pool and google_container_cluster the following scopes:

    oauth_scopes = [
      "https://www.googleapis.com/auth/compute",
      "https://www.googleapis.com/auth/devstorage.read_only",
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
    ]

@danawillow
Copy link
Contributor

Yes, that PR fixes the issue for google_container_node_pool as well. Version 1.5.0 of the google provider has the fix- is that the version you're using?

@xiuliren
Copy link

xiuliren commented Feb 6, 2019

I got this error using web console.

@xiuliren
Copy link

xiuliren commented Feb 6, 2019

I have to remove the taints nvidia.com/gpu=present: NO_SCHEDULE since it have to /will be setup automatically after gpu instances is running.

@ghost
Copy link

ghost commented Feb 7, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Feb 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants