Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ERR] agent: Coordinate update error: No cluster leader #36

Open
dukelyuu opened this issue Jul 20, 2018 · 12 comments
Open

[ERR] agent: Coordinate update error: No cluster leader #36

dukelyuu opened this issue Jul 20, 2018 · 12 comments

Comments

@dukelyuu
Copy link

i have deploy the consul latest version on kubernetes V1.10.0 .but the consul pod's log show these error message:
2018/07/20 11:26:11 [WARN] agent: Check "service:ribbon-consumer" HTTP request failed: Get http://DESKTOP-MCQSJ49:8504/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/07/20 11:26:15 [ERR] agent: failed to sync remote state: No cluster leader
2018/07/20 11:26:16 [ERR] agent: Coordinate update error: No cluster leader

the cluster doesnt work correctly.

@gabrielfsousa
Copy link

gabrielfsousa commented Jul 25, 2018

its because one of the consul replicas must boot with -bootstrap option.
since is a single file statefulset, add the option -bootstrap-expect=3

if you are using 3 replicas to consul, change to the number of replicas you are using

@karthikeayan
Copy link

Getting the same error:

2018/11/15 10:29:08 [INFO] agent: Discovered LAN servers:
2018/11/15 10:29:08 [WARN] agent: Join LAN failed: No servers to join, retrying in 30s
2018/11/15 10:29:15 [WARN] raft: no known peers, aborting election
2018/11/15 10:29:15 [ERR] agent: failed to sync remote state: No cluster leader
2018/11/15 10:29:23 [ERR] http: Request GET /v1/kv/config/gateway-prod/?recurse&token=<hidden>, error: No cluster leader from=10.233.68.72:32798

==> Newer Consul version available: 1.4.0 (currently running: 1.4.0)

@micksear
Copy link

micksear commented Feb 1, 2019

I also have this error. I have everything running in a namespace. Would that affect the label-based discovery, perhaps? I can see pods are running if I select with labels:

kubectl -n consul get po -l app=consul,component=server
NAME       READY   STATUS    RESTARTS   AGE
consul-0   1/1     Running   0          6m
consul-1   1/1     Running   0          7m
consul-2   1/1     Running   0          7m

I've updated to 1.4.2 of consul, and I'm running on GKE: 1.11.6-gke.3

My consul logs indicate no discovered servers:

2019/02/01 16:58:45 [ERR] agent: Coordinate update error: No cluster leader
2019/02/01 16:58:48 [ERR] agent: failed to sync remote state: No cluster leader
2019/02/01 16:58:49 [INFO] agent: Discovered LAN servers:
2019/02/01 16:58:49 [WARN] agent: Join LAN failed: No servers to join, retrying in 30s

I'm not sure what to check at this point. I have the -bootstrap-expect=3 enabled, but I wouldn't expect that to trigger anything if no other servers can be discovered...

@goughlee
Copy link

I had the same error with a docker hosted consul cluster (not on kubernetes though) and it turned out all of my instances had auto generated the same node ids. As soon as I manually set the node id differently on each instance (using -node-id argument) all was fine. Perhaps something to try.

@e100
Copy link

e100 commented May 24, 2019

@micksear

I had same issue when running in a different namespace with Consul 1.5.1
Editing server.json fixed it:

  "retry_join": [
    "provider=k8s namespace=customnamespace label_selector=\"app=consul,component=server\""
  ]

@itsecforu
Copy link

Got the same error with bootstrap-expect=3 in my consul.yaml
All pods into the same namespaces.

@itsecforu
Copy link

Did somebody solve it?

@Batirchik
Copy link

Batirchik commented Feb 17, 2020

Bumped into this issue today. The issue is caused by Affinity Settings.
By default, there are 3 replicas and if you have less than 3 nodes (e.g. 2), one pod won't come up and you will get the mentioned error. Thus, make sure that you have the corresponding number of node.

@gkannan66235
Copy link

gkannan66235 commented May 29, 2020

Error from consul:
2020-05-29T04:19:22.499Z [INFO] agent: Joining cluster...: cluster=LAN 2020-05-29T04:19:22.499Z [INFO] agent: (LAN) joining: lan_addresses=[consul-server-0.consul-sever.n1.svc, consul-server-1.consul-server.n1.svc, consul-server-2.consul-server.n1.svc[] 2020-05-29T04:19:22.543Z [WARN] agent.server.memberlist.lan: memberlist: Failed to resolve consul-server-1.consul-server.n1.svc: lookup consul-server-1.consul-server.n1.svc on 10.0.0.10:53: no such host 2020-05-29T04:25:01.506Z [ERROR] agent: Coordinate update error: error="No cluster leader" 2020-05-29T04:25:06.768Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" │ 2020-05-29T04:25:29.271Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" │
but all consul pod's in running status & if we run consul join manually its working.

NAME READY STATUS RESTARTS AGE consul-server-0 1/2 Running 0 13m consul-server-1 1/2 Running 0 13m consul-server-2 1/2 Running 0 13m

@gupf0719
Copy link

gupf0719 commented Feb 2, 2021

This bug has not been resolved in the current version 1.9.1

@deeco
Copy link

deeco commented May 26, 2021

@micksear

I had same issue when running in a different namespace with Consul 1.5.1
Editing server.json fixed it:

  "retry_join": [
    "provider=k8s namespace=customnamespace label_selector=\"app=consul,component=server\""
  ]

This resolved my issue for a cluster deployed into consul namespace, updated the server json in the configmap manifest to include below as per @e100

"retry_join": [
    "provider=k8s namespace=consul label_selector=\"app=consul,component=server\""
 ]

@Carmezim
Copy link

I've seen this issue occurring for multiple people several times.

If on k8s besides setting -bootstrap-expect to the number of servers you're running (e.g. 3-5 pods), deleting all PVCs and volumes after uninstalling consul completely was the only solution that worked for me.

It didn't matter what was done and re/uninstalls (helm based) Consul would be unable to properly bootstrap and elect a leader until not only all components were removed from the (k8s) cluster but the PVCs and volumes.

This note should be in the k8s section btw.

cc @gupf0719

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests