Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis/sentinel not working with IPv4 cluster #5957

Closed
3 tasks done
rumstead opened this issue Apr 3, 2021 · 13 comments · Fixed by #6005
Closed
3 tasks done

Redis/sentinel not working with IPv4 cluster #5957

rumstead opened this issue Apr 3, 2021 · 13 comments · Fixed by #6005
Labels
bug Something isn't working cherry-pick/2.0 Candidate for cherry picking into the 2.0 release branch
Milestone

Comments

@rumstead
Copy link
Member

rumstead commented Apr 3, 2021

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

I upgraded to ArgoCD v2.0.0-rc2 and the argocd-redis-ha-server stateful set wasn't starting. The logs contained address family errors.

Additionally, argocd-server is unable to connect to the argocd-redis-ha-haproxy.

To Reproduce

Deploy ArgoCD v2.0.0-rc2 to a Kubernetes cluster with IPv6 disabled.

Expected behavior

Redis and sentinel start up cleanly.

Version

❯ argocd version                                                                                                     
argocd: v1.8.7+eb3d1fb.dirty
  BuildDate: 2021-03-07T19:33:57Z
  GitCommit: eb3d1fb84b9b77cdffd70b14c4f949f1c64a9416
  GitTreeState: dirty
  GoVersion: go1.16
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v2.0.0-rc2+9603ae3
  BuildDate: 2021-03-29T21:29:13Z
  GitCommit: 9603ae37765dadd6e6db519896b8065ca277a775
  GitTreeState: clean
  GoVersion: go1.16
  Compiler: gc
  Platform: linux/amd64
  Ksonnet Version: v0.13.1
  Kustomize Version: v3.9.4 2021-02-09T19:22:10Z
  Helm Version: v3.5.1+g32c2223
  Kubectl Version: v0.20.4
  Jsonnet Version: v0.17.0

Logs
argocd-server

redis: 2021/04/03 20:33:44 pubsub.go:168: redis: discarding bad PubSub connection: EOF
redis: 2021/04/03 20:33:44 pubsub.go:168: redis: discarding bad PubSub connection: EOF
redis: 2021/04/03 20:33:44 pubsub.go:168: redis: discarding bad PubSub connection: EOF
redis: 2021/04/03 20:33:44 pubsub.go:168: redis: discarding bad PubSub connection: write tcp 172.29.68.15:43426->10.100.200.54:6379: write: broken pipe
redis: 2021/04/03 20:33:44 pubsub.go:168: redis: discarding bad PubSub connection: write tcp 172.29.68.15:43426->10.100.200.54:6379: write: broken pipe
redis: 2021/04/03 20:33:44 pubsub.go:168: redis: discarding bad PubSub connection: EOF
redis: 2021/04/03 20:33:44 pubsub.go:168: redis: discarding bad PubSub connection: EOF
redis: 2021/04/03 20:33:44 pubsub.go:168: redis: discarding bad PubSub connection: EOF
redis: 2021/04/03 20:33:44 pubsub.go:168: redis: discarding bad PubSub connection: EOF

argocd-redis-ha-server

13:C 03 Apr 2021 19:53:15.706 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
13:C 03 Apr 2021 19:53:15.706 # Redis version=6.2.1, bits=64, commit=00000000, modified=0, pid=13, just started
13:C 03 Apr 2021 19:53:15.706 # Configuration loaded
13:M 03 Apr 2021 19:53:15.706 * monotonic clock: POSIX clock_gettime
13:M 03 Apr 2021 19:53:16.003 # Could not create server TCP listening socket ::*:6379: unable to bind socket, errno: 97

argocd-redis-ha-haproxy

[NOTICE] 092/202813 (1) : New worker #1 (7) forked
[WARNING] 092/202821 (7) : Server check_if_redis_is_master_0/R0 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 092/202821 (7) : Server check_if_redis_is_master_0/R1 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 092/202821 (7) : Server check_if_redis_is_master_0/R2 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 092/202821 (7) : backend 'check_if_redis_is_master_0' has no server available!
[WARNING] 092/202821 (7) : Server check_if_redis_is_master_1/R0 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 092/202821 (7) : Server check_if_redis_is_master_1/R1 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 092/202821 (7) : Server check_if_redis_is_master_1/R2 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 092/202821 (7) : backend 'check_if_redis_is_master_1' has no server available!
[WARNING] 092/202821 (7) : Server check_if_redis_is_master_2/R0 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 092/202821 (7) : Server check_if_redis_is_master_2/R1 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 092/202821 (7) : Server check_if_redis_is_master_2/R2 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 092/202821 (7) : backend 'check_if_redis_is_master_2' has no server available!
[WARNING] 092/202821 (7) : Server bk_redis_master/R0 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 2 active and 0 backup servers left. 4 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 092/202821 (7) : Server bk_redis_master/R1 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 092/202821 (7) : Server bk_redis_master/R2 is DOWN, reason: Layer4 timeout, info: " at step 1 of tcp-check (connect)", check duration: 3000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 092/202821 (7) : backend 'bk_redis_master' has no server available!

I was able to get redis and sentinel running by adding the following to the redis.conf and sentinel.conf

bind 127.0.0.1

I am still trying to figure out what is happening with argocd-redis-ha-haproxy.

@rumstead rumstead added the bug Something isn't working label Apr 3, 2021
@rumstead
Copy link
Member Author

rumstead commented Apr 3, 2021

Updating the bind in redis.conf and sentinel.conf to the below seems to have fixed everything.

bind 0.0.0.0

Happy to create a PR if we feel that is the best course of action. I don't have an IPv6 cluster readily available to test.

@jessesuen
Copy link
Member

jessesuen commented Apr 5, 2021

I thought we switched away from using redis sentinel a while back. Are you running a default install of Argo CD? How did you install it?

@rumstead
Copy link
Member Author

rumstead commented Apr 5, 2021

I use https://github.com/argoproj/argo-cd/manifests/ha/cluster-install?ref=v2.0.0-rc2 as my remote base, though I wouldn't put it past myself as doing something wrong.

Maybe my terminology is incorrect but I still see it here.

@stefanhenseler
Copy link

I can confirm the issue. It happens if ipv6 is disabled. The suggested fix solves the issue. I'm using the v2.0.0 release from today.

@rumstead
Copy link
Member Author

rumstead commented Apr 7, 2021

Appreciate the confirmation @stefanhenseler (makes me feel like I am not losing my mind here with my remote bases LOL).

@stefanhenseler
Copy link

Appreciate the confirmation @stefanhenseler (makes me feel like I am not losing my mind here with my remote bases LOL).

No worries, I was actually glad I saw this issue, because it was working on my GKE Cluster and it didn't on VMware Tanzu (PKS).

I'm using the following base: https://github.com/argoproj/argo-cd/tree/master/manifests/ha/[email protected]

@rumstead
Copy link
Member Author

rumstead commented Apr 7, 2021

Yup, I am using PKS v1.9.4

@stefanhenseler
Copy link

stefanhenseler commented Apr 7, 2021

Yup, I am using PKS v1.9.4

Me too, same version.

@rumstead
Copy link
Member Author

rumstead commented Apr 8, 2021

@jessesuen I think it's clear that it is coming from the manifests provided by this repo. Do you have any suggestions on how to proceed? Do we just patch the expanded redis-ha manifest from our end or would it make sense to provide this patch upstream in the expanded chart?

Patching isn't super clean either :( kubernetes-sigs/kustomize#680

@alexmt alexmt added this to the v2.1 milestone Apr 8, 2021
@alexmt alexmt added the cherry-pick/2.0 Candidate for cherry picking into the 2.0 release branch label Apr 8, 2021
@alexmt
Copy link
Collaborator

alexmt commented Apr 8, 2021

Looking at it. Patching is not perfect of course. I think we should modify expanded redis-ha manifest and cherry-pick change to v2.0

@rumstead
Copy link
Member Author

rumstead commented Apr 8, 2021

@alexmt - are you suggesting just adding a values.yaml override in your manifests to add the bind here?

alexmt pushed a commit that referenced this issue Apr 12, 2021
…6005)

* fix(redis-ha): Adding explicit bind to redis and sentinel config to support IPv4 clusters. Closes #5957

Signed-off-by: Ryan Umstead <[email protected]>
alexmt pushed a commit that referenced this issue Apr 12, 2021
…6005)

* fix(redis-ha): Adding explicit bind to redis and sentinel config to support IPv4 clusters. Closes #5957

Signed-off-by: Ryan Umstead <[email protected]>
yujunz added a commit to abcue/argo-cd that referenced this issue Apr 15, 2021
5bc7297 fix: bitbucket server failing diagnostics:ping (argoproj#6029) (argoproj#6034)
8f53bd5 fix: add helm dependencies with custom CA (argoproj#6003)
8fd6f13 docs: Custom resource actions (argoproj#5838)
8a2897d docs: update delete policy verbiage (argoproj#6025)
c847bd9 chore: remove Argo CD CRDs from namespaced install (argoproj#6022)
61080b3 docs: improve Orphaned Resources Monitoring with more examples and correct grammar (argoproj#6006)
8301d39 Adding explicit bind to redis and sentinel for IPv4 clusters argoproj#5957 (argoproj#6005)
12cabdf fix: adding tests for helm OCI registry (argoproj#5978)
9da9514 docs: Add Ant Group to the list of users (argoproj#6011)
5e34a8a add Polarpoint.io (argoproj#6010)
2f92777 chore: move access checks from api server to repo server (argoproj#5940)
ae2d0ff fix(ui): Unscheduled pods in node view are now visible. Fixes argoproj#5981 (argoproj#5988)
b003f70 docs: SealedSecret status missing on k8s 1.16+ (argoproj#5846)
445872f fix: use correct field for evaluating whether or not GitHub Enterprise is selected (argoproj#5987)
9afa833 chore: Make e2e tests runnable against remote cluster (argoproj#5895)
shubhamagarwal19 pushed a commit to shubhamagarwal19/argo-cd that referenced this issue Apr 15, 2021
…#5957 (argoproj#6005)

* fix(redis-ha): Adding explicit bind to redis and sentinel config to support IPv4 clusters. Closes argoproj#5957

Signed-off-by: Ryan Umstead <[email protected]>
@elmazzun
Copy link

@rumstead so what was the solution, binding both redis and sentinel to 0.0.0.0?

@rumstead
Copy link
Member Author

rumstead commented Dec 18, 2023

@rumstead so what was the solution, binding both redis and sentinel to 0.0.0.0?

Yup, #6005

The issue again resurfaced #11388

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cherry-pick/2.0 Candidate for cherry picking into the 2.0 release branch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants