Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

there is new volme mount node-conf-redis-cluster defined for 0.15.2 version and its not defined in values.yaml #114

Open
csuryac opened this issue Jul 5, 2023 · 29 comments

Comments

@csuryac
Copy link

csuryac commented Jul 5, 2023

there is new volme mount defined for 0.15.2 version node-conf-redis-clusterwith default 1Mi but that is not defined in values.yaml . is it possible to declare in values .yaml so that we can increase the size

@shubham-cmyk
Copy link
Member

shubham-cmyk commented Jul 5, 2023

We can do that but I thought that the volume should only store the node.conf to make the cluster state running if the Main volume is not attached to it.

I think if you want to increase the size you must increase the other volume which stores the data changing this might not be a good idea.

What do you think about that?

one more thing there should be a default storage class in your k8s to make a 1Mi volume.
You might feel this is a bit hardcoded but we are ready to accept change if i would get a good recommendation

@csuryac
Copy link
Author

csuryac commented Jul 6, 2023

i agree but with some of the private clould providers default is 5gb getting that less space 1Mi might not be possible . any alternatives approaches for this ?

@shubham-cmyk
Copy link
Member

This is actually a real problem I have seen. I might try node volume bind but don't want to stick to that node, probably this thing would be addressed in the next release.

@ZleFox
Copy link

ZleFox commented Jul 11, 2023

I think it is a good practice and it is expected, when you use persistent storage, to be able to define the storage class

@csuryac
Copy link
Author

csuryac commented Jul 20, 2023

@shubham-cmyk the latest version 0.15.3 still has issues

✦ ❯ helm install redis-cluster ot-helm/redis-cluster
--set redisCluster.clusterSize=3 --namespace ot-operator
Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(RedisCluster.spec.storage): unknown field "nodeConfVolumeClaimTemplate" in in.opstreelabs.redis.redis.v1beta1.RedisCluster.spec.storage

@shubham-cmyk
Copy link
Member

This is because you have not updated the CRD.
If the CRD of the previous version is there then the helm install won't upgrade the CRD. You have to delete the CRD manually then only it would install

@revathyr13
Copy link

revathyr13 commented Aug 2, 2023

@shubham-cmyk or @iamabhishek-dubey

I am also facing the same issue. So may I know , What are the exact steps for upgrading the crd?

In case if we already have a crd in our cluster, does kubectl apply -f newcrdfile.yaml won't override the existing crd ? I mean the old crd?

You have to delete the CRD manually then only it would install,
Before deleting the CRD we have to delete the Redisclusters installed in the clusters right ? In that case we may lose the data.
I couldn't find any document which explains the operator or crd upgrade by keeping the existing clusters. Can someone please share ?

@shubham-cmyk
Copy link
Member

Yes you have uninstall and install it to prevent the data loss you have to make a backup and restore it. This way you can prevent the data loss.

@shubham-cmyk
Copy link
Member

@revathyr13 I would write a migration doc I think most of the user are facing it.

@revathyr13
Copy link

@shubham-cmyk
Thanks a lot for the update. Any ETA for the migration doc? Hope the doc will consider Redis standalone as well as cluster migration steps.

@shubham-cmyk
Copy link
Member

shubham-cmyk commented Aug 8, 2023

We do have some scripts that could make backup to the s3 and restore that.
You could check out that I would write a basic doc this today or tomorrow sure that could show to use that script efficiently.

Check the scripts : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/scripts

There are some other option available for the migration like velero you could check that out also

@revathyr13
Copy link

@shubham-cmyk

Thank you

@revathyr13
Copy link

Hello @shubham-cmyk

I tried the backup scripts from my end.

As per my understanding, the backup script creates the rbd snapshots of each master node and uploads to AWS/GCP S3 buckets. In our case it was AWS. This part works fine for me.

However, the restore part didn't work.

As per the script https://github.com/OT-CONTAINER-KIT/redis-operator/blob/master/scripts/restore/restore.bash it restores the latest rbd snapshot of master pods to the respective master pod right?
I tried in that manner, but I didn't get the data/keys in the source cluster from the destination redis cluster.

So please briefly explain the backup /restore process. Also do we need to take rbd snapshots of all the pods in the source cluster [master and slave] and restore them? I migrated the data from redis cluster running on 6 version in operator version 0.10 to the redis version 7 running in operator version 0.15. Not sure do we have any change restore/backup steps depends on redis operator version.
Awaiting for your reply

@shubham-cmyk
Copy link
Member

Yes you are right.
Just to confirm did you restore them in the initcontainers because if you restore after starting the server you can't recover it.
I have written a basic doc I would upload that now

@revathyr13
Copy link

revathyr13 commented Aug 25, 2023

I have created a new cluster and restored the snapshots from aws directly to the Redis master pods. I think at that time the redis cluster was running. In some docs I noticed that we have to stop the redis service before restoring the dump file. As i couldn't find any method, I just restored without stopping the service.

Please share the backup doc so that I can retry it with the help of it. Thank you

@shubham-cmyk
Copy link
Member

shubham-cmyk commented Aug 25, 2023

@revathyr13 Yes we have to use the initcontainer for that.
You may find the docs here OT-CONTAINER-KIT/redis-operator#588

@revathyr13
Copy link

revathyr13 commented Aug 25, 2023 via email

@shubham-cmyk
Copy link
Member

It is not published on the website yet but you can review these links :

backup : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/scripts/backup
restore : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/scripts/restore

There is backup.md and restore.md there plus we have manifest, Docker Image, env_vars.env also that would be used in this process.

@revathyr13
Copy link

Hello @shubham-cmyk

I tried passing the restore docker image as init containers. The dump files were restored properly as dump.rdb. But still, restoration was not successful. Let me explain the restoration steps I tried

  1. Initially I tried to restore after disabling appendonly aof [adding appendonly no in external config]. Dump.rbd files were successfully restored to the Data directory of each pod. However, the cluster join failed with the above errors.

10.236.70.209:6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.\n"}

I tried flushdb as well, but that didn't help.

  1. Second time I tried enabling the appendonly aof which is the default setting of the operator. This time as well, the dump.rbd files were successfully restored via init containers and appendoly directories were created. Cluster join also worked fine. However, no data/keys were able to be fetched from redis. Getting nil values for all the keys with values in the backup redis cluster. Get keys also throw nil values.

Not sure, if I am missing anything in the restore process. I am attaching the manifest I used.

cluster.txt

Please have a look and let me know your thoughts.

@shubham-cmyk
Copy link
Member

shubham-cmyk commented Sep 5, 2023

@revathyr13
What redis image and operator image are you using.
Also please join slack
https://github.com/OT-CONTAINER-KIT/helm-charts#contact-information
I would be more available there

@revathyr13
Copy link

revathyr13 commented Sep 5, 2023

Hello @shubham-cmyk ,

Thanks for the update

Version details

Source cluster
Operator version: 0.10.0
Redis version or image : opstree-redis:v6.2.5

Destination Cluster:
Operator version : 0.15.0
Redis version : Tried both in opstree-redis:v6.2.5 and opstree-redis:v7.0.5

@shubham-cmyk
Copy link
Member

shubham-cmyk commented Sep 5, 2023

You should use v7.0.11 for v0.15.0 @revathyr13

@revathyr13
Copy link

revathyr13 commented Sep 6, 2023

@shubham-cmyk

Thanks for the update. Tried with version v7.0.11 as well. No luck.
The restoration of dump.rdb files worked fine and Rediscluster was built up by the operator. But couldn't see any keys in cluster

bash-5.1$ ls -la
total 2664212
drwxrwxrwx 4 root root 4096 Sep 6 05:08 .
drwxr-xr-x 1 root root 57 Sep 6 05:08 ..
drwxr-xr-x 2 redis redis 4096 Sep 6 05:08 appendonlydir
-rw-r--r-- 1 root root 2728122568 Sep 6 05:07 dump.rdb
drwx------ 2 root root 16384 Sep 6 05:06 lost+found
bash-5.1$
10.233.68.50:6379> get devportal:re:XX:XXXX
(nil)
10.233.68.50:6379>

The above key have true value in source cluster

@shubham-cmyk
Copy link
Member

Let me inspect this issue what might be the problem
@revathyr13

@shubham-cmyk
Copy link
Member

shubham-cmyk commented Sep 6, 2023

if the dump.rdb are properly getting placed it mean the scripts are working fine.
Since we have moved the dump.rdb from the intiContainer we could make sure the redis-server is not started yet.

This might be some issue from the redis part I have to revisit the restore docs via dump.rbd

I am replaying the scenario right now will update.

@shubham-cmyk
Copy link
Member

shubham-cmyk commented Sep 6, 2023

@revathyr13

I just replayed the scenerio the keys were loaded but the cluster was not properly served so all keys were not loaded I am working on this

image

Check there I have added few manifest that i used and fixed a bug so that no restore to the follower pods
OT-CONTAINER-KIT/redis-operator#609

@revathyr13
Copy link

@shubham-cmyk
Thanks for checking. Waiting for further updates.

@revathyr13
Copy link

@shubham-cmyk

Any new updates.

@shubham-cmyk
Copy link
Member

@revathyr13

I have updated the scripts for backup and restore.
You may find the example here : https://github.com/OT-CONTAINER-KIT/redis-operator/tree/master/example/v1beta2/backup_restore

The restore on the operator : v0.15.1 is failing for now.
i am fixing that.

I have opened a issue : OT-CONTAINER-KIT/redis-operator#625
Let's move the conversation there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants