Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Proposal: ObjectBucketClaim CRD and Controller #2483

Closed
wants to merge 9 commits into from

Conversation

copejon
Copy link
Contributor

@copejon copejon commented Jan 11, 2019

**Description of your

Design documentation for the addition of an ObjectBucketClaim CRD and controller for the provisioning of S3 buckets. This design should be agnostic to the underlying provider.

Which issue is resolved by this Pull Request:
Resolves #2480

Checklist:

  • Documentation has been updated, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.
  • Comments have been added or updated based on the standards set in CONTRIBUTING.md

[skip ci]

add design doc for new CephObjectBucket CRD and controller

Signed-off-by: Jon Cope <[email protected]>

This design closes the loop for Rook-Ceph object stores with a CephObjectBucket controller and CRD. Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket. One ConfigMap will be created per CephObjectBucket. ConfigMap names will be deterministic and based on the name of the CephObjectBucket. This gives users the opportunity to define their Pod spec at the same time as the CephObjectBucket. Additionally, the CephObjectBucket and Pod can safely be deployed at the same time thanks to built in synchronization in Kubernetes.

Bucket deletion in the early stages of this controller design will not be addressed beyond cleaning up generated resources. Deleting a CephObjectBucket will only cause deletion of the associated ConfigMap. The controller will not delete the bucket from the object store.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding deletion is because it is considered an admin action, correct? I saw that elsewhere, but you might mention it here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. The consensus in early drafts was that it was preferable to suspend the bucket on API resource deletion. Ceph seems to supply this functionality. The reason being that it would be far too easy to accidentally destroy your buckets.

1. The user deletes the CephObjectBucket. This triggers an automated cleanup sequence:
1. The CephObjectBucket is marked for deletion first and left in the foreground. Deletion is blocked by a finalizer on the CephObjectBucket, pending step (b).
1. The respective ConfigMap is deleted.
1. The finalizer in the CephObjectBucket is removed and the object is garbage collected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not deleting the underlying bucket, do we still need the finalizer? If we're only cleaning up K8s resources, we shouldn't need a finalizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll reword this. The intent is to set an ownerRef on the CM as a child of the CephObjectBucket and blockOwnerDeletion: true. It seems like a fair safeguard against orphaning COBs or their CMs.

### CephObjectBucket

```yaml
apiVersion: object.k8s.io/v1alpha
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we're adding all Ceph CRDs to the apiVersion ceph.rook.io/v1 since a single api group must all have the same version. But separately I'd like to find a way to allow new CRDs to bake until we really think they're stable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also appreciate that. Adding this as a v1 when additions / changes are going to be made would be misleading. I'll change it to v1 for the time being since that's the standard practice.

finalizers:
- rook-ceph.io/provisioner/my-bucket-1
data:
BUCKET_HOST: http://my-store-url-or-ip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For naming consistency it would be nice if these vars weren't all caps. Is this necessary to make it easier to mount the configmap as env vars with the expected name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done to adhere to bash conventions on env var names when mounting CMs with envFrom: where the entire CM is exported into the env. This is also done in k8s docs

Copy link
Contributor Author

@copejon copejon Jan 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the field names, if these are a little to generic, I'm fine with renaming them to something more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These names look good. My only thought is that we might add the ROOK_ prefix to them if we think there might be a conflict with other env vars, but not sure it's worth worrying about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's probably worth the small effort to change it. Will do.

provisioner: kubernetes.io/rook-ceph-operator
parameters:
# namespace/objectStoreService
objectStore: rook-ceph.io/objectStoreName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of namespace/objectStoreName, why not have two separate properties for namespace and objectStore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was rook-ceph.io intended for the namespace or should this be rook-ceph?

Copy link
Contributor Author

@copejon copejon Jan 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of namespace/objectStoreName, why not have two separate properties for namespace and objectStore?

I'm fine with that, will change.

was rook-ceph.io intended for the namespace or should this be rook-ceph?

nope! that's a typo, will fix

uid: 1234-qwer-4321-rewq
apiVersion: object.k8s.io/v1
kind: CephObjectBucket
finalizers:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn We've tried to make use of finalizers to make the system more robust against user error. Do you think this is a better option than either:

  1. let it be deleted, do nothing
  2. let it be deleted but regenerate it

Is there a common practice for this in rook?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Rook we assume the user is an admin and they will have full access to the resources. If they delete resources rook relies on, it could kill the cluster. I don't recall hearing feedback from users that they think this is a problem.

However, in this case, it's not the admin who would be affected. Perhaps end users might be more prone to accidental operations? In my experience, finalizers should be used very sparingly. You have to rely on the rook operator to automate removal of the finalizer. If the operator is stopped before a finalizer is removed, the resource will live forever. As an example, we do have a finalizer on the CephCluster CRD and because users might stop the operator before removing the cluster, we end up with special cleanup documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@travisn For now I'm ok with this level of manual cleanup in cases where the operator is unavailable. At least the finalizer can be removed and the protected object deleted, but normally the operator would handle deletion of the finalizer.

### Assumptions

- The `storageClass` field has been added to the CephObjectStoreUser API
- Rook-Ceph Object has been deployed in a Kubernetes Cluster ([_see:_ quick start](https://rook.github.io/docs/rook/master/object.html))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this link is broken for me, did you mean https://rook.io/docs/rook/master/ceph-object.html?


### Assumptions

- The `storageClass` field has been added to the CephObjectStoreUser API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this strictly necessary to implement this new bucket CRD and controller? it looks like this proposal could still be implemented without the StorageClass being added to the CephObjectStoreUser CRD.

Copy link
Contributor

@jeffvance jeffvance Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbw976 Addindg storage class to the user CRD is not necessary for the bucket design. The bucket CRD just needs to know the object store service and secret. As long as the name of the key-pair secret is deterministic the CephObjectStoreUser is not required at all.

Update: per some discussions with @copejon it is not that important that the secret name be deterministic. Although that's possible with a minor (breaking) change to the rook object user, the secret name may not be deterministic for other object stores. Also, considering the brown-field case where the user wants to wrap and ObjectBucketClaim around an existing bucket and creds, the secret name may typically be non-deterministic. It's not much of a burden on the app pod developer to list the target secret before creating her pod.


- The `storageClass` field has been added to the CephObjectStoreUser API
- Rook-Ceph Object has been deployed in a Kubernetes Cluster ([_see:_ quick start](https://rook.github.io/docs/rook/master/object.html))
- The `CephObjectStoreUser` has been created in the developer's namespace and the associated credentials are stored in a `Secret`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of the possibility of generating a user on-demand for the bucket if there is no user specified? or do we want to be very explicit and keep that as a separate and required pre-requisite step?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copejon and I like that idea and it has been mentioned before. What is your opinion on phasing this to a later pr, or would this be the more common use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbw976 @jeffvance Given that we're changing direction to a generalized provisioner rather than a ceph specific one, this would be a step back for us. The reason being that we'd be coupled (again) to rook-ceph CRDs which is what we're trying to avoid.

Copy link
Contributor

@jeffvance jeffvance Jan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...we'd be coupled (again) to rook-ceph CRDs which is what we're trying to avoid.

+1


[1] `bucketName` and `generateBucketName` are mutually exclusive. It is expected that `generateBucketName` be the common usage. This field produces a random bucket name beginning with the value of this stanza. If bucketName is defined then that value alone is used as the name of the bucket. Within an object store, buckets must be unique, so if bucketName is used there may be a higher chance of name collision. The api will continue retrying if the bucket cannot be created.

### CephObjectBucketConfigMap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about capturing all connection information, including both location information and credentials information, into a single secret? that may be an easier model for apps/pods to consume everything they need to connect from a single k8s object. thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've considered adding the endpoint to the secret and eliminating the configMap. A couple of thoughts:

  1. the endpoint value doesn't need to be encrypted/hidden so it better "fits" the configMap use case.
  2. if I have one user (and thus 1 secret) and I want to create two buckets for the same user, then the secret would need a list of endpoints. But if the real bucket name is randomized then the secret needs to associate the endpoint to the bucket CRD name.

Copy link
Contributor Author

@copejon copejon Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbw976 Do you mean copying the access keys into a new secret? It would definitely simplify things but it does mean that changes to the secret (changning credentials) wouldn't propagate to the bucket-key secrets.

It also means that we assume the app should always have ownership rights of the bucket.

Here's a contrived case for using the configMap where I have 2 secrets for 2 different key pairs:

  1. owner: full perms
  2. web frontend: readonly

I would use the owner key pair secret to provision the claim.
Then manually bind the web frontend keypair to the bucket with a readOnly policy. (We don't have plans yet for doing this in k8s)
Finally, I would mount the readonly key secret and the config map.

So storing the access keys and bucket data together reduces our flexibility.

@copejon copejon force-pushed the object-bucket-proposal branch from 96054da to 203cbdf Compare January 16, 2019 20:05
@copejon copejon changed the title Design Proposal: CephObjectBucket CRD and Controller Design Proposal: ObjectBucketClaim CRD and Controller Jan 16, 2019
@copejon copejon force-pushed the object-bucket-proposal branch from 203cbdf to 91677fe Compare January 16, 2019 20:09

One ConfigMap will be created per `ObjectBucketClaim` instance. ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name. This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`. Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes.

Bucket deletion in the early stages of this controller design will not be addressed beyond cleaning up API objects. Deletion of the bucket within the object store is left as an administrator responsibility. This is to prevent the accidental destruction of data by users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "deletion is in the early..."

Copy link
Contributor Author

@copejon copejon Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not actually a typo :)

![work flow](./ObjectBucketClaim.png)

1. The Rook Operator detects a new `ObjectBucketClaim` instance.
1. The operator uses `objectBucketClaim.spec.accessKeySecretName` to get the S3 access keys secret.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is accessKey a necessary part of this property name? I'm thinking that secretName is clear enough. Also, there are both access keys and secret keys so this name could imply there's another secret containing the secret key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. It feels like it makes it more clear at a glance without hurting readability.

The keys are "access key id" and "secret access key", so it's not too far off. Maybe "accessKeyPairSecret"?

namespace: dev-user
labels:
rook.io/bucket-provisioner:
rook.io/object-bucket-claim: my-bucket-1 <sup>1</sup>
Copy link
Contributor

@jeffvance jeffvance Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<sup> pair visible

spec:
storageClassName: some-object-store
accessKeySecretName: my-s3-access-keys
generateBucketName: prefix<sup>2</sup>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto re sup

accessKeySecretName: my-s3-access-keys
generateBucketName: prefix<sup>2</sup>
status:
phase: ["creating", "available", "error", "unknown"]
Copy link
Contributor

@jeffvance jeffvance Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider this k8s issue: Eliminate Phase and simplify Conditions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some good points and counter points in there. Rethinking it, maybe we should drop status for the v1 implementation to focus on core mechanics.

labels:
rook.io/object-bucket-claim-controller:
rook.io/object-bucket-claim: my-bucket-1
ownerReferences: <sup>1</sup>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sup visible

@copejon copejon force-pushed the object-bucket-proposal branch from 6e89649 to c9c1ea4 Compare January 16, 2019 21:54
@jeffvance
Copy link
Contributor

LGTM. Waiting on comments from others...


Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket.

One ConfigMap will be created per `ObjectBucketClaim` instance. ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name. This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`. Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: thanks "to" in-built...


**Use Case: Provision a Bucket**

_As a Kubernetes user, I want leverage the Kubernetes API to create S3 buckets. I expect to get back the bucket connection information in a ConfigMap._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: I want "to" leverage...

1. The operator uses `objectBucketClaim.spec.storageClassName` to get the `Service` endpoint of the object store.
1. The operator uses the object store endpoint and access keys for an S3 "make bucket" call.
1. The operator creates a ConfigMap in the namespace of the `ObjectBucketClaim` with relevant connection data of the bucket.
1. An app Pod may then mount the Secret and the ConfigMap to begin accessing the bucket.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the bucket claim is created, there will be a small delay (normally less than a few seconds, but possibly longer if the operator is busy) before the secret is generated and the pod can start. What will be the behavior if the app pod and bucket claim are created at the same time? Will the pod retry starting or crash loop until the secret exists? Or is there something more graceful we can do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some testing this morning to 2x check: when a pod is created that references a configMap or a key within a configMap (i.e. the pod.spec.container[0] uses envFrom.configMapRef or env[0].valueFrom.ConfigMapKeyRef) before the configMap exists, the pod's status will be CreateContainerConfigError.

Once the configMap is created, the kubelet will start the pod as usual.

This eventual consistency means that users can define their pod spec with references to the credential secret and bucket configmap (with deterministic names) in a single yaml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add - when mounting as volume, the pod will sit in containerCreating status until the CM exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copjon and I assume the behavior you tested above for config maps also applies to secrets that aren't yet available to a pod?

_As a Kubernetes user, I want to delete `ObjectBucketClaim` instances and cleanup generated API resources._

1. The user deletes the `ObjectBucketClaim` via `kubectl delete ...`.
1. The `ObjectBucketClaim` is marked for deletion and left in the foreground.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure there is a need for a finalizer here. Regenerating a lost secret would be simpler than managing a finalizer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we won't be setting the finalizer ourselves. Instead, we'll set and ownerReference with blockOwnerDeletion: true in the configMap to make it a child of the bucket claim. The kubelet will be responsible for finalizers. This keeps configMaps from being orphaned when the claim is deleted.

blockOwnerDeletion is more of a cherry on top since the configMap isn't blocked from deletion when referenced by a pod (as env var or volume).

generateBucketName: prefix [2]
```

1. Added by the rook operator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I would expect the operator to add a label. What about adding the generated bucket name to the crd status section? The status is easily mutable, but it's best to treat the user-defined crd as immutable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, doesn't the corresponding configmap already contain the bucket name?

Copy link
Contributor Author

@copejon copejon Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding the generated bucket name to the crd status section? The status is easily mutable, but it's best to treat the user-defined crd as immutable.

Fair point. status: edits by users are ignored by controllers. metadata is a grey area I think. I'm fairly certain that k8s core controllers add annotations to user defined objects. I'm not as sure about labels.

The intent was to provide a clean way for users/admins to get all API objects related to the claim operator or a specific bucket: kubectl get cm,objectbucketclaims --all-namespaces -l 'rook.io/object-bucket-claim-controller:' and similarly in code.

Copy link
Contributor

@jeffvance jeffvance Jan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the CNV's Container Data Importer (CDI) controller adds a label. Adding annotations is a common practice from what I've seen.

```

1. Added by the rook operator.
1. As with `metadata.generateName`, the operator will append a hyphen followed by random characters to the string given here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is generateBucketName optional? Or we will always generate the bucket name?
  • In general, nouns are preferred over verbs in the crd. What about calling this bucketNamePrefix?

Copy link
Contributor Author

@copejon copejon Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is generateBucketName optional? Or we will always generate the bucket name?

We will always generate the name to prevent collisions in the object store. The name itself will be stored in key:values or env vars so it will be abstracted from the user somewhat.

In general, nouns are preferred over verbs in the crd. What about calling this bucketNamePrefix?

I'm okay with that, will change.

kind: Secret
metadata:
name: my-s3-key-pair
namespace: dev-user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the secret also add the ownerReference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. It wouldn't be a child of the operator, claim, or configmap. Since this is a generalized design, the secret may be user or operator defined, so it's dependencies aren't deterministic from our point of view.


## Design

### Work Flow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a section in the workflow for the admin creating the storage class that exposes the object store for bucket creation? This will help illustrate that until the admin 1) creates the storage class and 2) starts the operator that responds to the storage class, no buckets can be provisioned. This will also help explain why it is a general solution and not specific to a provider such as ceph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 will do


## Proposed Feature

**A generalized method of S3 bucket provisioning through the implementation of an `ObjectBucketClaim` CRD and controller.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this proposal work for Minio and for EdgeFS buckets as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, yes :) The controller will import the aws go sdk so stores that present an S3 interface should be compatible.


![work flow](./ObjectBucketClaim.png)

1. The Rook Operator detects a new `ObjectBucketClaim` instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which operator is watching for new ObjectBucketClaim instances to be created? Does that operator/controller have the general responsibility of provisioning buckets for any s3 compatible object store? or will ceph, minio, edgeFS all have their own running and watching?

Copy link
Contributor Author

@copejon copejon Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which operator is watching for new ObjectBucketClaim instances to be created?

The claim controller will run inside it's own operator the same as the other rook operators (ceph, nfs, edgefs, etc).

Does that operator/controller have the general responsibility of provisioning buckets for any s3 compatible object store?

yes. the operator should be agnostic to the store provider. the only requirement is that the provider expose an S3 interface

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wouldn't each provisioner only understand one particular object store type (ceph in rook's case, at least initially)? If we want to provision minio or s3 buckets we'd set the StorageClass provisioner set to some other provisioner (rook minio, or something that does s3, or whatever).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't each provisioner only understand one particular object store type (ceph in rook's case, at least initially)?

As far as the object store, yes since each store provider has its own requirements for provisioning (CLI vs API calls, etc). However, buckets creation is abstracted thru the S3 interface. A single bucket operator can remain unaware of the underlying store type so long as they expose and S3 endpoint.

name: some-object-store
provisioner: rook.io/object-bucket-claim-controller
parameters:
objectStoreService: my-store
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how this StorageClass could be used by a general bucket provisioner to determine the s3 service endpoint in a general way, these values look specific to the RGW object store config information. How would this work for Minio or EdgeFS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In EdgeFS serviceName "my-store" will be exposed as generic Kubernetes service. Having service name defined would help EdgeFS operator (or maybe even be done in a generic way also) to determine URL endpoint to call. The bucket then can be created via general API call which I hope can be part of Rook framework itself. Essentially API has to mimic "curl http://URL:PORT/bucketName -X PUT" semantic, handle authorization and errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm misunderstanding the question. Looking at minio and edgefs: they both generate Service API objects. Referencing the Service in the StorageClass enables the operator to Get() the Service, from which it will get the IP and Port that the S3 app is listening on. In this way, admins can "publish" one or more object stores and the user is not required to know anything about them.


Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket.

One ConfigMap will be created per `ObjectBucketClaim` instance. ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name. This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`. Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the deterministic rules for determining the configMap name defined in this proposal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will add pattern for naming

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling we over designing this a bit. Pretty much all object storage systems have to handle dynamic bucket creation via AWS S3 API call. If they don't - then it means it is incompatible, and we should not bother. Rook framework would need to provide v2 and v4 authentication support, allow a user to select that and initiate the HTTP(S) call using CRD provided authorization credentials. Why do we need to introduce a storage class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true that objects storage systems should generally provide a standard API for bucket creation, but that's not the case for user creation (both radosgw and minio do it via a CLI, for example), and the provisioner needs to do both users and buckets.

The purpose of the StorageClass is to describe which object store to provision against. It might be ceph cluster foo, AWS S3 region us-east, whatever.

Copy link
Contributor

@jeffvance jeffvance Jan 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The storage class contains a provisioner property which can be used by the bucket controller to decide how to create the bucket Error on my part: the provisioner field is essentially ignored. It could be checked by the controller to make sure it's the expected value, but its value is just a hard-coded string that implies bucket provisioning. The CRD operator is just watching for bucket CRD events and any event on this type of CRD is s3-compatible bucket related.

The parameters property names the object store service and namespace, which relieves the app developer from these concerns.

Copy link
Contributor Author

@copejon copejon Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a standard API for bucket creation, but that's not the case for user creation (both radosgw
and minio do it via a CLI, for example), and the provisioner needs to do both users and buckets.

@liewegas Is the implication that bucket creation would be implemented redundantly in each rook object provisioner? It's clear why we need to do this for users however bucket creation could be handled by a single operator with a AWS S3 client.

Signed-off-by: Jon Cope <[email protected]>
Signed-off-by: Jon Cope <[email protected]>
@dyusupov
Copy link
Contributor

dyusupov commented Jan 18, 2019 via email

@jeffvance
Copy link
Contributor

Isn't it better to isolate user provisioning from bucket creation to keep things a bit simpler?

This bucket provisioning proposal does not depend on a user. It just needs a secret (creds) and an object store service.

@copejon
Copy link
Contributor Author

copejon commented Jan 21, 2019

I'd suggest we should stick to generic Rook provisioning CRD, which would
support all the S3 compatible stores via AWS S3 API.

This is the core design point of this proposal. By leveraging the S3 API, we can generalized bucket provisioning in a single operator.

And make this proposal be optional, or perhaps make it operator specific, to begin with.

This seems to reverse the above quote. Bucket creation via the S3 API is the core concept of this proposal. That said, I'm only familiar with ceph and minio and not edgeFS. Does it provide bucket provisioning operations outside of the S3 API?

@@ -45,6 +43,7 @@ The admin ...

- creates the ObjectBucketClaim Operator (a Deployment API object)
- creates a StorageClass with `parameters[serviceName]=objectstore-service` and `parameters[serviceNamespace]=some-namespace`
- `provisioner` field is left blank. Since only the ObjectBucketClaim operator is watching ObjectBucketClaim instances, defining the operator is unnecessary.
Copy link
Contributor

@jeffvance jeffvance Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...defining the operator/provisioner is unnecessary, and If a value is provided it is silently ignored."

Agree?

parameters:
serviceName: my-store
serviceNamespace: some-namespace
```

1. Since ObjectBucketClaim instances are only watched by the ObjectBucketClaim operator, defining the provisioner is unnecessary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like


**A generalized method of S3 bucket provisioning through the implementation of an ObjectBucketClaim CRD and controller.**

Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket.
Copy link
Contributor

@jeffvance jeffvance Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: keep all singular or all plural. Eg.
Users will request a bucket via the Kubernetes API and have returned a ConfigMap with connection information for that bucket , or
Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for each bucket


One ConfigMap will be created per ObjectBucketClaim instance. ConfigMap names will be semi-deterministic and derived from the ObjectBucketClaim name. If an ObjectBucketClaim is created with the `generateName` field, then the name cannot be known ahead of time.

Additionally, the ObjectBucketClaim and Pod can be deployed at the same time thanks to in-built synchronization in Kubernetes. Pods mounting the configMap as either `envFrom:` or `env[0].valueFrom.ConfigMapKeyRef` will have a status of `CreateContainerConfigError` until the configMap is created. Pods mounting as a volume will have a status of `ContainerCreating` until the configMap is created. In both cases, the Pod will start once the configMap exists.
Copy link
Contributor

@jeffvance jeffvance Jan 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Pods mounting as a volume will have ... --> "Pods mounting a configMap as a volume will have"

#### Assumptions

- An object store has been provisioned with a reachable endpoint. ([e.g. the Rook-Ceph Object provisioner](https://rook.github.io/docs/rook/master/ceph-object.html)) This may also be an external endpoint such as AWS S3.
- A Service and/or Endpoint API object exists to route connections to the object store. Typically, object store operators generate these. In cases where the is external to the cluster, they must be configured manually.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: A Service and/or Endpoint API object exists I'd be explicit and say "A Kubernetes Service and/or Endpoint API object exists"


**Use Case: Expose an Object Store Endpoint**

_As a Kubernetes admin, I want to expose an object store to cluster users for bucket provisioning via the Kubernetes API._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a difference between this use-case and the one above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

editing mistake, will fix

rook.io/bucket-provisioner:
rook.io/object-bucket-claim: my-bucket-1 [1]
spec:
storageClassName: some-object-store
Copy link
Contributor

@jeffvance jeffvance Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see value in supporting both storageClass and store? See 2482 comment regarding providing both. A kick-the-tires use case might be better served if the object store service name can be supplied directly without reliance on a storage admin to create a storage class.

Copy link
Contributor Author

@copejon copejon Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good point. Instead of store: though, it should either be a store url or a kubernetes service name. I lean towards URL.

namespace: dev-user
labels:
rook.io/bucket-provisioner:
rook.io/object-bucket-claim: my-bucket-1 [1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think the operator should also add an anno containing the configMap name? It can be easily derived so not essential...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a reasonable thought. Another way to get the associated CM would be by rook.io/object-bucket-claim: my-bucket-1 label selector. It's slightly more obtuse - what do you think?

```yaml
apiVersion: rook.io/v1alpha2
kind: ObjectBucketClaim
metadata:
Copy link
Contributor

@jeffvance jeffvance Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be an ownerReferences value pointing to the ceph cluster here? IOW, when the cluster is deleted this CRD would also be deleted. Or is it better long term to not tie this CRD to the rook-ceph cluster?

@dyusupov
Copy link
Contributor

That said, I'm only familiar with ceph and minio and not edgeFS. Does it provide bucket provisioning operations outside of the S3 API?

It does. This proposal makes sense indeed. How are we going to deal with region/namespace and tenantName? Will we pass it down to storage provider level?

@copejon
Copy link
Contributor Author

copejon commented Jan 24, 2019

How are we going to deal with region/namespace ...

IMO region/namespace would fit best in the StorageClass as an optional parameter. This puts the onus on the cluster admin to know if the object provider they are using supports/expects regions.

For n regions, a cluster admin would define <=n storageClasses, each representing a different region in the region parameter field. A storage class naming convention should be encouraged in documentation. e.g. s3-object-store-us-east-1

Through RBAC, cluster admins can define access to each region per namespace by limiting access to a subset of storageClasses. This is accomplished with the resources[ ] and resourceNames[ ] fields in (cluster)roles.

and tenantName

After some cursory reading, my understanding is that tenant names are optional in aws s3, mino, ceph, and edgefs. If left empty, an empty string is applied as the tenant name. I think we have a few options in how we could address the feature:

  1. Left to later versions of the operator. Bucket requests are submitted without a specified tenant
  • in the alphav1 version of this controller, this might be preferable to deliver an mvp with plans to support it in later versions
  1. represented as a separate CRD and provisioned by another controller inside this proposal's operator
  • adds complexity to the early design but would start the project out with a common feature
  • abstracting tenants at the kube layer provides users kubectl CRUD ops on their tenants
  • objectBucketClaims would reference the tenant API object by name ( co-namespaced is assumed)
  1. defined in the objectBucketClaim as an optional field. If defined, the tenant is created on demand by the operator, then creates the bucket with that tenant.
  • streamlined, doesn't increase user actions on bucket creation
  • does not provide kubectl CRUD ops for tenants.
  • clean up left to users/admins or automated in the controller

I'm undecided on which would be the most reasonable choice (or if there are others I haven't thought of).

@dyusupov @liewegas @travisn @jeffvance thoughts?

@dyusupov
Copy link
Contributor

How are we going to deal with region/namespace ...

Through RBAC, cluster admins can define access to each region per namespace by limiting access to a subset of storageClasses. This is accomplished with the resources[ ] and resourceNames[ ] fields in (cluster)roles.

This works for edgefs. The difference I guess from other providers is that EdgeFS needs namespace (btw, can be any and not necessarily limited by AWS dictations) as well as tenantName in terms of to build universally unique object path, e.g. NAMESPACE/TENANT/BUCKET/OBJECT. It is called Name Hash ID and used for cross-sites lookups, metadata-only transfers, data I/O mesh policy, etc.

and tenantName

1 .represented as a separate CRD and provisioned by another controller inside this proposal's operator

  1. defined in the objectBucketClaim as an optional field. If defined, the tenant is created on demand by the operator, then creates the bucket with that tenant.

I would do (1) and (2). We could defer (1) to a separate pull request while having (2) be predefined as a part of this pull request.

@copejon
Copy link
Contributor Author

copejon commented Jan 25, 2019

How are we going to deal with region/namespace ...

Through RBAC, cluster admins can define access to each region per namespace by limiting access to a subset of storageClasses. This is accomplished with the resources[ ] and resourceNames[ ] fields in (cluster)roles.

This works for edgefs. The difference I guess from other providers is that EdgeFS needs namespace (btw, can be any and not necessarily limited by AWS dictations) as well as tenantName in terms of to build universally unique object path, e.g. NAMESPACE/TENANT/BUCKET/OBJECT. It is called Name Hash ID and used for cross-sites lookups, metadata-only transfers, data I/O mesh policy, etc.

and tenantName

  1. represented as a separate CRD and provisioned by another controller inside this proposal's operator
  1. defined in the objectBucketClaim as an optional field. If defined, the tenant is created on demand by the operator, then creates the bucket with that tenant.

I would do (1) and (2). We could defer (1) to a separate pull request while having (2) be predefined as a part of this pull request.

I think this seems reasonable. Implementing (2) early on would lay a framework for handling defined tenants. (1) would follow up with a change in how the information is delivered to that framework.

I'll add these changes to the doc

@jeffvance
Copy link
Contributor

Re tenants: I haven't thought this through, so just pitching it as a possibility. Perhaps options 0 and 2 can be combined. My thinking is that I don't want to slow down the dev of bucket provisioning and option 2 could be done later.
What I mean by "combining" 0 and 2 is that we don't yet support tenants but we do support a tenant field as an optional parameter. The api checks that if tenant is provided its value must be empty, otherwise an error is returned. This means that no one can be using tenants in phase 1 but the field exists and can be added in phase 2 without a major version roll. Thoughts?

@copejon
Copy link
Contributor Author

copejon commented Jan 25, 2019

The downside of 2 on its own is that users/admins would be expected to perform tenant cleanup manually, outside of k8s. I realize that we set this requirement for buckets; should it extend to tenants as well?

cc @liewegas @travisn


A cluster user will request bucket via the Kubernetes API and have returned a ConfigMap with connection information for that bucket.

One ConfigMap will be created per ObjectBucketClaim instance. ConfigMap names will be semi-deterministic and derived from the ObjectBucketClaim name. If an ObjectBucketClaim is created with the `generateName` field, then the name cannot be known ahead of time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not creating an ObjectBucket object? Like we do with PVC and PV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The K8s Pod API expects a mountable/attachable k8s primitive (configMap, secret) for data injection, so we have to use a core API resource instead of a PV analog CRD.

namespace: dev-user
data:
accessKeyId: <base64 encoded string>
secretAccessKey: <base64 encoded string>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd advocate against the use of AWS secrets.
1st AWS is working on binding IAM to pods.
2nd in the meantime, I'm fine with scheduling the pod on nodes that have the right instance profile.

Copy link
Contributor Author

@copejon copejon Jan 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd advocate against the use of AWS secrets.
1st AWS is working on binding IAM to pods.

Will this work for object stores besides AWS S3 (ceph object, edgefs, minio, etc)?

2nd in the meantime, I'm fine with scheduling the pod on nodes that have the right instance profile.

Does this assume multiple cluster users would use the same instance profile? If user A and user B work within their own namespaces and require access to an object store, would both of their pods potentially use the same instance profile?

@jbw976
Copy link
Member

jbw976 commented Feb 2, 2019

Thank you for meeting to discuss this further today. To summarize the discussion, we think that this support for generalized bucket provisioning is something that the greater Kubernetes community wants to see and has value. We also agree that being able to demonstrate something working, as opposed to just having a "grand idea", is more effective at getting community buy-in and rallying support.

We think this is going in the right direction, but the concern here is that this work would be better served in a different venue, it might not be a strong fit in the scope of the Rook project.

Bucket provisioning would be better suited at the workload level, supported by a scheduler, topology aware placement, dynamic/static resource matching and binding, separation of concerns (admin creds to create bucket, app creds to consume bucket, user management), etc. Essentially more machinery around the consuming workload. A design that has the abstraction closer to the app level and scheduling/provisioning is a stronger approach.

That would be similar to the PVC/PV model where the claim abstraction is closer to the workload and scheduler. Rook should be focusing on orchestrating and integrating concrete implementations of storage. Rook should be bringing new providers of storage into this ecosystem that implement these abstractions, but Rook shouldn't go beyond the storage (e.g., into scheduling and workload machinery).

This would likely be a better for the following venues:

We are very open to collaborating and incorporating this effort into the Crossplane project. You can read more in the Crossplane vision and architecture doc.

We need to figure out what to split out and what to keep from the design in this PR. I think the generalized bucket abstraction and a bucket controller make sense split out while a concrete Ceph bucket CRD makes more sense here in Rook. We can discuss further and once again thank you for your efforts and driving this initiative.

@copejon
Copy link
Contributor Author

copejon commented Feb 6, 2019

@jbw976 @bassam @liewegas @travisn @eboyd @jeffvance

Thank you for sharing your time with us; we took a lot away from the meeting and it's given us a better since of the direction this project should head. We firmly agree that a generalized implementation of a bucket provisioner would be better implemented somewhere other than Rook.

With that in mind, I'm going to go ahead and close this PR so that we can start a fresh dialogue within the scope of a concrete Ceph CRD design. The benefit being that the conversation won't contain the history of drastic changes from specific to generalized and back to specific design scope.

I'll reference this PR in the next so that we can still keep track of the overall history of the proposal.

@copejon copejon closed this Feb 6, 2019
@copejon copejon mentioned this pull request Feb 6, 2019
5 tasks
@bassam
Copy link
Member

bassam commented Feb 7, 2019

Thanks @copejon and @eboyd! Looking forward to seeing how this generalized bucket approach develops!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ObjectBucketClaim CRD and Controller
8 participants