Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add VolumeGroupSnapshotClass for CephFS and RBD #2859

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ShravaniVangur
Copy link
Contributor

@ShravaniVangur ShravaniVangur commented Oct 17, 2024

Testing creation of VolumeGroupSnapshotClass and related functionalities:

  • CSI-drivers require VolumeGroupSnapshotClass for VolumeGroupSnapshot creation.
NAME                                             DRIVER                                  DELETIONPOLICY   AGE
ocs-storagecluster-cephfsplugin-groupsnapclass   openshift-storage.cephfs.csi.ceph.com   Delete           50m
ocs-storagecluster-rbdplugin-groupsnapclass      openshift-storage.rbd.csi.ceph.com      Delete           50m

  • Description of ocs-storagecluster-cephfsplugin-groupsnapclass:
Name:             ocs-storagecluster-cephfsplugin-groupsnapclass
Namespace:        
Labels:           <none>
Annotations:      <none>
API Version:      groupsnapshot.storage.k8s.io/v1alpha1
Deletion Policy:  Delete
Driver:           openshift-storage.cephfs.csi.ceph.com
Kind:             VolumeGroupSnapshotClass
Metadata:
  Creation Timestamp:  2024-10-23T08:18:54Z
  Generation:          1
  Resource Version:    77802
  UID:                 bd8437f3-4c7c-4307-9826-14d026ce343a
Parameters:
  Cluster ID:                                             openshift-storage
  csi.storage.k8s.io/group-snapshotter-secret-name:       rook-csi-cephfs-provisioner
  csi.storage.k8s.io/group-snapshotter-secret-namespace:  openshift-storage
  Fs Name:                                                ocs-storagecluster-cephfilesystem
Events:                                                   <none>

  • On creating VolumeGroupSnapshot:
NAME                     READYTOUSE   VOLUMEGROUPSNAPSHOTCLASS                         VOLUMEGROUPSNAPSHOTCONTENT                              CREATIONTIME   AGE
cephfs-groupsnapshot01   true         ocs-storagecluster-cephfsplugin-groupsnapclass   groupsnapcontent-e5a91abd-52ae-4dac-a6d2-6ad1a3f4f524   7m47s          37m

  • Restoring CephFS volumegroupsnapshot to a new PVC (cephfs-pvc-01-restore here):
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
cephfs-pvc-01                 Bound    pvc-29d36e86-62bc-4b24-b7e2-359b6a522c6b   1Gi        RWO            ocs-storagecluster-cephfs     <unset>                 56m
cephfs-pvc-01-restore         Bound    pvc-58173f95-bf0c-456d-9636-71e39783a16f   1Gi        RWX            ocs-storagecluster-cephfs     <unset>                 9s

  • Creating a pod to utilise the new PVC
NAME                   READY   STATUS    RESTARTS   AGE
csi-cephfs-vgsc-test   1/1     Running   0          113s

@ShravaniVangur ShravaniVangur force-pushed the volgrp-snapclass branch 2 times, most recently from acdabd0 to de40e7b Compare October 18, 2024 06:15
Copy link
Member

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

The API is not available in Beta yet.

@nixpanic PTAL

disable bool
}

var driverName, driverValue string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any specific to have global variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, changing them to local variables.

break
}

r.Log.Info("Uninstall: Deleting GroupSnapshotClass.", "GroupSnapshotClass", klog.KRef(existing.Namespace, existing.Name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need to log Namespace for groupsnapshot as its a cluster scoped resouces

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is applicable for all the places where we are logging the groupsnapshot name


vsc := vscc.groupSnapshotClass
existing := &groupsnapapi.VolumeGroupSnapshotClass{}
err := r.Client.Get(context.TODO(), types.NamespacedName{Name: vsc.Name, Namespace: vsc.Namespace}, existing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of context.TODO use the context inside StorageClusterReconciler r.ctx

case errors.IsNotFound(err):
r.Log.Info("Uninstall: GroupSnapshotClass not found, nothing to do.", "GroupSnapshotClass", klog.KRef(sc.Namespace, sc.Name))
default:
r.Log.Error(err, "Uninstall: Error while getting GroupSnapshotClass.", "GroupSnapshotClass", klog.KRef(sc.Namespace, sc.Name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we need to retry if there is any deletion error? i see we are returning nil at the end of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes adding that in the switch case

@@ -186,6 +188,7 @@ func (r *StorageRequestReconciler) SetupWithManager(mgr ctrl.Manager) error {
Watches(&rookCephv1.CephClient{}, enqueueForOwner).
Watches(&storagev1.StorageClass{}, enqueueStorageConsumerRequest).
Watches(&snapapi.VolumeSnapshotClass{}, enqueueStorageConsumerRequest).
Watches(&groupsnapapi.VolumeGroupSnapshotClass{}, enqueueStorageConsumerRequest).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we dont need snapshot,groupsnapshot,pv,storageclass permission in storagerequest_controller. @leelavg can you please confirm it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I didn't get the GH notification 🤔. Anyways, yes, we don't watch for these resources and need the struct to be encoded in gRPC response.

Just rechecked existing code and I don't think anything necessary for provider mode. do we need more info than below inaddition to changes to storageclaimcontroller in https://github.com/red-hat-storage/ocs-client-operator/pull/168/files?

&pb.ExternalResource{
Name: "ceph-rbd",
Kind: "VolumeGroupSnapshotClass",
Data: mustMarshal(map[string]string{
"csi.storage.k8s.io/group-snapshotter-secret-name": provisionerSecretName,
})},

},
Driver: generateNameForSnapshotClassDriver(SnapshotterType(groupSnaphotterType)),
Parameters: map[string]string{
"clusterID": instance.Namespace,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we setting filesystem name and blockpool in the class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is set by the driverName and driverValue variables whose values are assigned by the setParameterBasedOnSnapshotterType function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this confused me a bit too, it is well hidden. driverName is not really a very suitable name, as the parameter is called fsName for CephFS or pool for RBD. But, I expect this to work correctly. Having a test that validates these kind of parameters would be nice to have.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 18, 2024
@nixpanic
Copy link
Member

The API is not available in Beta yet.

See kubernetes-csi/external-snapshotter#1150 for the BETA status PR.

@Madhu-1
Copy link
Member

Madhu-1 commented Oct 23, 2024

we might need to have a VGSC CRD check as ocs-operator 4.18 needs to run on OCP 4.16 without any problem for EUS to EUS upgrade? @iamniting @Nikhil-Ladha thoughts?

Comment on lines 13 to 16
return "ms_mode=secure"
}

// If Encryption is not enabled, but Compression or RequireMsgr2 is enabled, use prefer-crc mode
if sc.Spec.Network != nil && sc.Spec.Network.Connections != nil &&
((sc.Spec.Network.Connections.Compression != nil && sc.Spec.Network.Connections.Compression.Enabled) ||
sc.Spec.Network.Connections.RequireMsgr2) {
return "ms_mode=prefer-crc"
}

// Network spec always has higher precedence even in the External or Provider cluster. so they are checked first above

// None of Encryption, Compression, RequireMsgr2 are enabled on the StorageCluster
// If it's an External or Provider cluster, We don't require msgr2 by default so no mount options are needed
if sc.Spec.ExternalStorage.Enable || sc.Spec.AllowRemoteStorageConsumers {
return "ms_mode=legacy"
}
// If none of the above cases apply, We set RequireMsgr2 true by default on the cephcluster
// so we need to set the mount options to prefer-crc
// If encryption is not enabled, use prefer-crc mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this auto-generated file change is already merged, please rebase your PR once to fix this.

@Nikhil-Ladha
Copy link
Member

we might need to have a VGSC CRD check as ocs-operator 4.18 needs to run on OCP 4.16 without any problem for EUS to EUS upgrade? @iamniting @Nikhil-Ladha thoughts?

Yep, as going forward the plan will be updgrade ODF first, it is advisable to have these checks in place for new changes.
We can make use of the availCRD check that have been recently implemented to check for the CRD, before trying to create the SC.

Copy link
Member

@Nikhil-Ladha Nikhil-Ladha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break the PR into multiple commits, where as keep the generated changes into 1 commit, and the code changes into another.

@iamniting
Copy link
Member

we might need to have a VGSC CRD check as ocs-operator 4.18 needs to run on OCP 4.16 without any problem for EUS to EUS upgrade? @iamniting @Nikhil-Ladha thoughts?

Yep, as going forward the plan will be updgrade ODF first, it is advisable to have these checks in place for new changes. We can make use of the availCRD check that have been recently implemented to check for the CRD, before trying to create the SC.

I agree we should have such checks, But let's not use availCrds If we are not watching the resource. Otherwise, our controller may restart. Lets have a plain check.

Copy link
Member

@iamniting iamniting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls have a generated changes in the separate commit.

@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 24, 2024
@ShravaniVangur ShravaniVangur force-pushed the volgrp-snapclass branch 2 times, most recently from 3ec52f3 to 6c2c34e Compare October 24, 2024 18:14
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 24, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 24, 2024
@ShravaniVangur
Copy link
Contributor Author

I have added a plain check for VGSC CRD as well. Please do review it. @Madhu-1 @iamniting @Nikhil-Ladha

Comment on lines 450 to 452
if vgsc {
objs = append(objs, &ocsGroupSnapshotClass{})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check should be inside the ensure funcs not here.

@@ -244,6 +245,7 @@ func (r *StorageClusterReconciler) SetupWithManager(mgr ctrl.Manager) error {
).
Watches(&storagev1.StorageClass{}, enqueueStorageClusterRequest).
Watches(&volumesnapshotv1.VolumeSnapshotClass{}, enqueueStorageClusterRequest).
Watches(&volumegroupsnapshotv1a1.VolumeGroupSnapshotClass{}, enqueueStorageClusterRequest).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make use of available CRD feature as we are watching the resource.

Copy link
Member

@iamniting iamniting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pls fix tests?

},
Driver: generateNameForSnapshotClassDriver(SnapshotterType(groupSnaphotterType)),
Parameters: map[string]string{
"clusterID": instance.Namespace,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this confused me a bit too, it is well hidden. driverName is not really a very suitable name, as the parameter is called fsName for CephFS or pool for RBD. But, I expect this to work correctly. Having a test that validates these kind of parameters would be nice to have.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 14, 2024
Copy link
Contributor

openshift-ci bot commented Dec 12, 2024

New changes are detected. LGTM label has been removed.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2024
Copy link
Contributor

openshift-ci bot commented Dec 12, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nixpanic, ShravaniVangur
Once this PR has been reviewed and has the lgtm label, please ask for approval from iamniting. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 12, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 12, 2024
@ShravaniVangur ShravaniVangur force-pushed the volgrp-snapclass branch 5 times, most recently from bd9d6ce to b4de5dc Compare December 12, 2024 13:32
"sigs.k8s.io/controller-runtime/pkg/reconcile"
)

type GroupSnapshotterType string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to export this type as its not used outside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 currently no but this would mean changing all the variable types from GroupSnapshotterType to string directly in functions such as newVolumeGroupSnapshotClass and in generate.go file as well. Is this the expected format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type GroupSnapshotterType string
type groupSnapshotterType string

can we do something like this as its used outside of this file?

}

func newVolumeGroupSnapshotClass(instance *ocsv1.StorageCluster, groupSnaphotterType GroupSnapshotterType) *groupsnapapi.VolumeGroupSnapshotClass {
driverType, driverValue := setParameterBasedOnSnapshotterType(instance, groupSnaphotterType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of driverType and driverValue, it should be paramKey and paramValue?


err := r.createGroupSnapshotClasses(vgsc)
if err != nil {
return reconcile.Result{}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be returning err isnt it?

case errors.IsNotFound(err):
r.Log.Info("Uninstall: GroupSnapshotClass not found, nothing to do.", "GroupSnapshotClass", klog.KRef("", sc.Name))
default:
r.Log.Error(err, "Uninstall: Error while getting GroupSnapshotClass.", "GroupSnapshotClass", klog.KRef("", sc.Name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont we need to return err if something fails? same problem exists in storageclass and volumesnapshot class as well. @iamniting can you please confirm this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Madhu-1 For the ensureCreated function the error does need to be returned. For the ensureDeleted function the error is being returned in the first case. Only when it is not found we are logging it. This could be the case when r.AvailableCrds[VolumeGroupSnapshotClassCrdName] is false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think in that ase if will be errors.IsNotFound not any other error, this could leave us to a case where we might leave the stale classes due to some internal API server errors or something else. i will let @iamniting to confirm it, except this one everything else LGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an error we should return an error IMO.

CSI-drivers requires VolumeGroupSnapshotClass for VolumeGroupSnapshot.

Signed-off-by: ShravaniVangur <[email protected]>
Updates the go mod dependencies.

Signed-off-by: ShravaniVangur <[email protected]>
func generateNameForSnapshotClassDriver(snapshotType SnapshotterType) string {
return fmt.Sprintf("%s.%s.csi.ceph.com", util.StorageClassDriverNamePrefix, snapshotType)
}

func setParameterBasedOnSnapshotterType(instance *ocsv1.StorageCluster, groupSnapshotterType groupSnapshotterType) (string, string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use the same name for type and variable.

Comment on lines +27 to +28
groupSnapshotterSecretName = "csi.storage.k8s.io/group-snapshotter-secret-name"
groupSnapshotterSecretNamespace = "csi.storage.k8s.io/group-snapshotter-secret-namespace"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add key at the end of both variables.

Comment on lines +160 to +161
existing.ObjectMeta.OwnerReferences = sc.ObjectMeta.OwnerReferences
sc.ObjectMeta = existing.ObjectMeta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to alter the OwnerReferences and ObjectMeta while deleting?

@iamniting
Copy link
Member

@ShravaniVangur Is the code tested? Also, is updating the GroupSnapshotClass allowed?

Copy link
Contributor

@malayparida2000 malayparida2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShravaniVangur Is this intended to go in 4.18? If yes we would need a Jira Bug.
If this is intended for 4.19 then if there is a Jira Story/Upstream Issue for this please attach it here.

Please mark as resolved the conversations which have been already resolved, Normally that should be done by the reviewer but here too many open conversations is making it confusing to take a final look.

@Madhu-1 still has a hold on this PR, please discuss with him about the reason & get it resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants