POD recovery, Can't Sync pods IP to nodes.conf #20

brucemei · 2020-08-24T08:58:22Z

when one POD is fault, the POD IP(my) is invalid in the Persistence nodes.conf;

I suggest container start-up to refresh.

iamabhishek-dubey · 2020-08-27T16:30:12Z

Can you please share the logs and screenshot, also are you using the latest version?

brucemei · 2020-08-28T06:52:53Z

the operator resources with master branch sourcecode.

when i apply the redis.yaml(ref: example/redis-cluster-example.yaml), and the redis master&slave success; After delete the redis cluster with the redis.yaml, In a few minutes, i apply the redis.yaml again to create redis-cluster.
the new pods are running OK, but redis cluster is fail, all POD IP are refreshed, But the persistent nodes.conf in PVC is Invalid

redis.yaml as below:

apiVersion: redis.opstreelabs.in/v1alpha1
kind: Redis
metadata:
  name: redis
spec:
  mode: cluster
  size: 3
  global:
    image: opstree/redis:v2.0
    imagePullPolicy: Always
    password: "N1A8mhMAVqxx"
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 100m
        memory: 128Mi
  master:
    service:
      type: ClusterIP
  slave:
    service:
      type: ClusterIP
  redisExporter:
    enabled: true
    image: quay.io/opstree/redis-exporter:1.0
    imagePullPolicy: Always
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 100m
        memory: 128Mi
  storage:
    VolumeClaimTemplates:
      spec:
        accessModes: 
          - ReadWriteOnce
        storageClassName: dev-ceph-block
        resources:
          requests:
            storage: 500M
      selector: {}

adevjoe · 2020-09-09T09:03:04Z

i have same problem sometime.
try restart all pods

apiVersion: redis.opstreelabs.in/v1alpha1
kind: Redis
metadata:
  name: redis
spec:
  global:
    image: 'quay.io/opstree/redis:v2.0'
    imagePullPolicy: IfNotPresent
    password: Opstree@12345
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
  master:
    service:
      type: ClusterIP
  mode: cluster
  redisExporter:
    enabled: true
    image: 'quay.io/opstree/redis-exporter:1.0'
    imagePullPolicy: Always
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
  size: 3
  slave:
    service:
      type: ClusterIP
  storage:
    volumeClaimTemplate:
      selector: {}
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi

10.102.211.247:6379> cluster nodes
3af5f2fad054e7a898ce9604dab5f3b904d9872c 10.199.2.25:6379@16379 myself,master - 0 1599641626075 1 connected 0-5460
79ed1a5f8c8a57912d424ffa109798492a44f997 10.199.2.13:6379@16379 master,fail? - 1599641628581 1599641626075 3 connected 10923-16383
c48000914c0fdc73370fdb6832db0f9a7f616b86 10.199.2.17:6379@16379 slave,fail? 3af5f2fad054e7a898ce9604dab5f3b904d9872c 1599641627980 1599641626075 1 connected
dbb656d05bc5ace787665a268e533c9f625425b9 10.199.0.73:6379@16379 master,fail? - 1599641626978 1599641626075 4 connected 5461-10922
16e18678ec1ccf976014f192121dbb8e487e7b32 10.199.2.18:6379@16379 slave,fail? 79ed1a5f8c8a57912d424ffa109798492a44f997 1599641628581 1599641626075 3 connected
50148f4dd9e271803f21858951d3f5d4bd51b1d6 10.199.2.16:6379@16379 slave,fail? dbb656d05bc5ace787665a268e533c9f625425b9 1599641628581 1599641626075 4 connected
10.102.211.247:6379> cluster info
cluster_state:fail
cluster_slots_assigned:16384
cluster_slots_ok:5461
cluster_slots_pfail:10923
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:4
cluster_my_epoch:1
cluster_stats_messages_ping_sent:5
cluster_stats_messages_sent:5
cluster_stats_messages_received:0

ianwatsonrh · 2020-09-16T20:18:13Z

Experiencing the same problem. Stale entries in the nodes.conf is redirecting redis clients to IPs that no longer exist.

I've got around it quickly by extending the redis image and amending the start_redis() command

start_redis() {
    echo "Starting redis service....."
    redis-server /etc/redis/redis.conf --cluster-announce-ip $(ip addr show eth0 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1)
}

A more elegant solution is referenced here redis/redis#4289 but was easier for me to amend the image than the operator.

ianwatsonrh · 2020-09-21T20:08:00Z

Seems this is partially fixed in github but not on operator hub

iamabhishek-dubey · 2020-12-31T18:12:16Z

Fixed in #26

egorksv · 2024-01-30T23:23:32Z

/reopen

egorksv · 2024-01-30T23:29:48Z

This is still an issue for new clusters: old version of nodes.conf (i.e. after cluster re-build) are retained in PVC.

Suggestion: add functionality to RedisCluster reconciler that will run CLUSTER RESET HARD on new nodes if cluster state is "Bootstrap"

drivebyer · 2024-01-31T02:37:13Z

This is still an issue for new clusters: old version of nodes.conf (i.e. after cluster re-build) are retained in PVC.

In my environment, this issue only occurs when all Redis nodes are recreated simultaneously, and they cannot distinguish themselves from one another. This does not happen during a rolling update.

egorksv · 2024-01-31T13:28:15Z

That's exactly what I'm talking about: when deploying new cluster and not getting all configuration right (ESPECIALLY SSL/TLS, these are notoriously error-prone), full cluster restart is required, and the fastest way to achieve that is to kill all running pods. Which leads directly to this problem.

iamabhishek-dubey closed this as completed Dec 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POD recovery, Can't Sync pods IP to nodes.conf #20

POD recovery, Can't Sync pods IP to nodes.conf #20

brucemei commented Aug 24, 2020

iamabhishek-dubey commented Aug 27, 2020

brucemei commented Aug 28, 2020 •

edited

Loading

adevjoe commented Sep 9, 2020

ianwatsonrh commented Sep 16, 2020 •

edited

Loading

ianwatsonrh commented Sep 21, 2020

iamabhishek-dubey commented Dec 31, 2020

egorksv commented Jan 30, 2024

egorksv commented Jan 30, 2024

drivebyer commented Jan 31, 2024

egorksv commented Jan 31, 2024

POD recovery, Can't Sync pods IP to nodes.conf #20

POD recovery, Can't Sync pods IP to nodes.conf #20

Comments

brucemei commented Aug 24, 2020

iamabhishek-dubey commented Aug 27, 2020

brucemei commented Aug 28, 2020 • edited Loading

adevjoe commented Sep 9, 2020

ianwatsonrh commented Sep 16, 2020 • edited Loading

ianwatsonrh commented Sep 21, 2020

iamabhishek-dubey commented Dec 31, 2020

egorksv commented Jan 30, 2024

egorksv commented Jan 30, 2024

drivebyer commented Jan 31, 2024

egorksv commented Jan 31, 2024

brucemei commented Aug 28, 2020 •

edited

Loading

ianwatsonrh commented Sep 16, 2020 •

edited

Loading