-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design Proposal: ObjectBucketClaim CRD and Controller #2483
Conversation
add design doc for new CephObjectBucket CRD and controller Signed-off-by: Jon Cope <[email protected]>
|
||
This design closes the loop for Rook-Ceph object stores with a CephObjectBucket controller and CRD. Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket. One ConfigMap will be created per CephObjectBucket. ConfigMap names will be deterministic and based on the name of the CephObjectBucket. This gives users the opportunity to define their Pod spec at the same time as the CephObjectBucket. Additionally, the CephObjectBucket and Pod can safely be deployed at the same time thanks to built in synchronization in Kubernetes. | ||
|
||
Bucket deletion in the early stages of this controller design will not be addressed beyond cleaning up generated resources. Deleting a CephObjectBucket will only cause deletion of the associated ConfigMap. The controller will not delete the bucket from the object store. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoiding deletion is because it is considered an admin action, correct? I saw that elsewhere, but you might mention it here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. The consensus in early drafts was that it was preferable to suspend the bucket on API resource deletion. Ceph seems to supply this functionality. The reason being that it would be far too easy to accidentally destroy your buckets.
1. The user deletes the CephObjectBucket. This triggers an automated cleanup sequence: | ||
1. The CephObjectBucket is marked for deletion first and left in the foreground. Deletion is blocked by a finalizer on the CephObjectBucket, pending step (b). | ||
1. The respective ConfigMap is deleted. | ||
1. The finalizer in the CephObjectBucket is removed and the object is garbage collected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're not deleting the underlying bucket, do we still need the finalizer? If we're only cleaning up K8s resources, we shouldn't need a finalizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll reword this. The intent is to set an ownerRef on the CM as a child of the CephObjectBucket and blockOwnerDeletion: true
. It seems like a fair safeguard against orphaning COBs or their CMs.
### CephObjectBucket | ||
|
||
```yaml | ||
apiVersion: object.k8s.io/v1alpha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we're adding all Ceph CRDs to the apiVersion ceph.rook.io/v1
since a single api group must all have the same version. But separately I'd like to find a way to allow new CRDs to bake until we really think they're stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also appreciate that. Adding this as a v1 when additions / changes are going to be made would be misleading. I'll change it to v1 for the time being since that's the standard practice.
finalizers: | ||
- rook-ceph.io/provisioner/my-bucket-1 | ||
data: | ||
BUCKET_HOST: http://my-store-url-or-ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For naming consistency it would be nice if these vars weren't all caps. Is this necessary to make it easier to mount the configmap as env vars with the expected name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done to adhere to bash conventions on env var names when mounting CMs with envFrom:
where the entire CM is exported into the env. This is also done in k8s docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the field names, if these are a little to generic, I'm fine with renaming them to something more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names look good. My only thought is that we might add the ROOK_
prefix to them if we think there might be a conflict with other env vars, but not sure it's worth worrying about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's probably worth the small effort to change it. Will do.
provisioner: kubernetes.io/rook-ceph-operator | ||
parameters: | ||
# namespace/objectStoreService | ||
objectStore: rook-ceph.io/objectStoreName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of namespace/objectStoreName
, why not have two separate properties for namespace
and objectStore
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was rook-ceph.io
intended for the namespace or should this be rook-ceph
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of namespace/objectStoreName, why not have two separate properties for namespace and objectStore?
I'm fine with that, will change.
was rook-ceph.io intended for the namespace or should this be rook-ceph?
nope! that's a typo, will fix
…params into ns and name Signed-off-by: Jon Cope <[email protected]>
uid: 1234-qwer-4321-rewq | ||
apiVersion: object.k8s.io/v1 | ||
kind: CephObjectBucket | ||
finalizers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@travisn We've tried to make use of finalizers to make the system more robust against user error. Do you think this is a better option than either:
- let it be deleted, do nothing
- let it be deleted but regenerate it
Is there a common practice for this in rook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Rook we assume the user is an admin and they will have full access to the resources. If they delete resources rook relies on, it could kill the cluster. I don't recall hearing feedback from users that they think this is a problem.
However, in this case, it's not the admin who would be affected. Perhaps end users might be more prone to accidental operations? In my experience, finalizers should be used very sparingly. You have to rely on the rook operator to automate removal of the finalizer. If the operator is stopped before a finalizer is removed, the resource will live forever. As an example, we do have a finalizer on the CephCluster CRD and because users might stop the operator before removing the cluster, we end up with special cleanup documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@travisn For now I'm ok with this level of manual cleanup in cases where the operator is unavailable. At least the finalizer can be removed and the protected object deleted, but normally the operator would handle deletion of the finalizer.
### Assumptions | ||
|
||
- The `storageClass` field has been added to the CephObjectStoreUser API | ||
- Rook-Ceph Object has been deployed in a Kubernetes Cluster ([_see:_ quick start](https://rook.github.io/docs/rook/master/object.html)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this link is broken for me, did you mean https://rook.io/docs/rook/master/ceph-object.html?
|
||
### Assumptions | ||
|
||
- The `storageClass` field has been added to the CephObjectStoreUser API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this strictly necessary to implement this new bucket CRD and controller? it looks like this proposal could still be implemented without the StorageClass
being added to the CephObjectStoreUser
CRD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbw976 Addindg storage class to the user CRD is not necessary for the bucket design. The bucket CRD just needs to know the object store service and secret. As long as the name of the key-pair secret is deterministic the CephObjectStoreUser
is not required at all.
Update: per some discussions with @copejon it is not that important that the secret name be deterministic. Although that's possible with a minor (breaking) change to the rook object user, the secret name may not be deterministic for other object stores. Also, considering the brown-field case where the user wants to wrap and ObjectBucketClaim
around an existing bucket and creds, the secret name may typically be non-deterministic. It's not much of a burden on the app pod developer to list the target secret before creating her pod.
|
||
- The `storageClass` field has been added to the CephObjectStoreUser API | ||
- Rook-Ceph Object has been deployed in a Kubernetes Cluster ([_see:_ quick start](https://rook.github.io/docs/rook/master/object.html)) | ||
- The `CephObjectStoreUser` has been created in the developer's namespace and the associated credentials are stored in a `Secret` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think of the possibility of generating a user on-demand for the bucket if there is no user specified? or do we want to be very explicit and keep that as a separate and required pre-requisite step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copejon and I like that idea and it has been mentioned before. What is your opinion on phasing this to a later pr, or would this be the more common use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbw976 @jeffvance Given that we're changing direction to a generalized provisioner rather than a ceph specific one, this would be a step back for us. The reason being that we'd be coupled (again) to rook-ceph CRDs which is what we're trying to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...we'd be coupled (again) to rook-ceph CRDs which is what we're trying to avoid.
+1
|
||
[1] `bucketName` and `generateBucketName` are mutually exclusive. It is expected that `generateBucketName` be the common usage. This field produces a random bucket name beginning with the value of this stanza. If bucketName is defined then that value alone is used as the name of the bucket. Within an object store, buckets must be unique, so if bucketName is used there may be a higher chance of name collision. The api will continue retrying if the bucket cannot be created. | ||
|
||
### CephObjectBucketConfigMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about capturing all connection information, including both location information and credentials information, into a single secret? that may be an easier model for apps/pods to consume everything they need to connect from a single k8s object. thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've considered adding the endpoint to the secret and eliminating the configMap. A couple of thoughts:
- the endpoint value doesn't need to be encrypted/hidden so it better "fits" the configMap use case.
- if I have one user (and thus 1 secret) and I want to create two buckets for the same user, then the secret would need a list of endpoints. But if the real bucket name is randomized then the secret needs to associate the endpoint to the bucket CRD name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbw976 Do you mean copying the access keys into a new secret? It would definitely simplify things but it does mean that changes to the secret (changning credentials) wouldn't propagate to the bucket-key secrets.
It also means that we assume the app should always have ownership rights of the bucket.
Here's a contrived case for using the configMap where I have 2 secrets for 2 different key pairs:
- owner: full perms
- web frontend: readonly
I would use the owner key pair secret to provision the claim.
Then manually bind the web frontend keypair to the bucket with a readOnly policy. (We don't have plans yet for doing this in k8s)
Finally, I would mount the readonly key secret and the config map.
So storing the access keys and bucket data together reduces our flexibility.
96054da
to
203cbdf
Compare
Signed-off-by: Jon Cope <[email protected]>
203cbdf
to
91677fe
Compare
|
||
One ConfigMap will be created per `ObjectBucketClaim` instance. ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name. This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`. Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes. | ||
|
||
Bucket deletion in the early stages of this controller design will not be addressed beyond cleaning up API objects. Deletion of the bucket within the object store is left as an administrator responsibility. This is to prevent the accidental destruction of data by users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "deletion is in the early..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not actually a typo :)
![work flow](./ObjectBucketClaim.png) | ||
|
||
1. The Rook Operator detects a new `ObjectBucketClaim` instance. | ||
1. The operator uses `objectBucketClaim.spec.accessKeySecretName` to get the S3 access keys secret. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is accessKey
a necessary part of this property name? I'm thinking that secretName
is clear enough. Also, there are both access keys and secret keys so this name could imply there's another secret containing the secret key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. It feels like it makes it more clear at a glance without hurting readability.
The keys are "access key id" and "secret access key", so it's not too far off. Maybe "accessKeyPairSecret"?
namespace: dev-user | ||
labels: | ||
rook.io/bucket-provisioner: | ||
rook.io/object-bucket-claim: my-bucket-1 <sup>1</sup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<sup> pair visible
spec: | ||
storageClassName: some-object-store | ||
accessKeySecretName: my-s3-access-keys | ||
generateBucketName: prefix<sup>2</sup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto re sup
accessKeySecretName: my-s3-access-keys | ||
generateBucketName: prefix<sup>2</sup> | ||
status: | ||
phase: ["creating", "available", "error", "unknown"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider this k8s issue: Eliminate Phase and simplify Conditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some good points and counter points in there. Rethinking it, maybe we should drop status for the v1 implementation to focus on core mechanics.
labels: | ||
rook.io/object-bucket-claim-controller: | ||
rook.io/object-bucket-claim: my-bucket-1 | ||
ownerReferences: <sup>1</sup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sup visible
Signed-off-by: Jon Cope <[email protected]>
6e89649
to
c9c1ea4
Compare
LGTM. Waiting on comments from others... |
|
||
Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket. | ||
|
||
One ConfigMap will be created per `ObjectBucketClaim` instance. ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name. This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`. Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: thanks "to" in-built...
|
||
**Use Case: Provision a Bucket** | ||
|
||
_As a Kubernetes user, I want leverage the Kubernetes API to create S3 buckets. I expect to get back the bucket connection information in a ConfigMap._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: I want "to" leverage...
1. The operator uses `objectBucketClaim.spec.storageClassName` to get the `Service` endpoint of the object store. | ||
1. The operator uses the object store endpoint and access keys for an S3 "make bucket" call. | ||
1. The operator creates a ConfigMap in the namespace of the `ObjectBucketClaim` with relevant connection data of the bucket. | ||
1. An app Pod may then mount the Secret and the ConfigMap to begin accessing the bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the bucket claim is created, there will be a small delay (normally less than a few seconds, but possibly longer if the operator is busy) before the secret is generated and the pod can start. What will be the behavior if the app pod and bucket claim are created at the same time? Will the pod retry starting or crash loop until the secret exists? Or is there something more graceful we can do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some testing this morning to 2x check: when a pod is created that references a configMap or a key within a configMap (i.e. the pod.spec.container[0]
uses envFrom.configMapRef
or env[0].valueFrom.ConfigMapKeyRef
) before the configMap exists, the pod's status will be CreateContainerConfigError
.
Once the configMap is created, the kubelet will start the pod as usual.
This eventual consistency means that users can define their pod spec with references to the credential secret and bucket configmap (with deterministic names) in a single yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to add - when mounting as volume, the pod will sit in containerCreating
status until the CM exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copjon and I assume the behavior you tested above for config maps also applies to secrets that aren't yet available to a pod?
_As a Kubernetes user, I want to delete `ObjectBucketClaim` instances and cleanup generated API resources._ | ||
|
||
1. The user deletes the `ObjectBucketClaim` via `kubectl delete ...`. | ||
1. The `ObjectBucketClaim` is marked for deletion and left in the foreground. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure there is a need for a finalizer here. Regenerating a lost secret would be simpler than managing a finalizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we won't be setting the finalizer ourselves. Instead, we'll set and ownerReference
with blockOwnerDeletion: true
in the configMap to make it a child of the bucket claim. The kubelet will be responsible for finalizers. This keeps configMaps from being orphaned when the claim is deleted.
blockOwnerDeletion
is more of a cherry on top since the configMap isn't blocked from deletion when referenced by a pod (as env var or volume).
generateBucketName: prefix [2] | ||
``` | ||
|
||
1. Added by the rook operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I would expect the operator to add a label. What about adding the generated bucket name to the crd status
section? The status is easily mutable, but it's best to treat the user-defined crd as immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, doesn't the corresponding configmap already contain the bucket name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding the generated bucket name to the crd status section? The status is easily mutable, but it's best to treat the user-defined crd as immutable.
Fair point. status:
edits by users are ignored by controllers. metadata
is a grey area I think. I'm fairly certain that k8s core controllers add annotations to user defined objects. I'm not as sure about labels.
The intent was to provide a clean way for users/admins to get all API objects related to the claim operator or a specific bucket: kubectl get cm,objectbucketclaims --all-namespaces -l 'rook.io/object-bucket-claim-controller:'
and similarly in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the CNV's Container Data Importer (CDI) controller adds a label. Adding annotations is a common practice from what I've seen.
``` | ||
|
||
1. Added by the rook operator. | ||
1. As with `metadata.generateName`, the operator will append a hyphen followed by random characters to the string given here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is
generateBucketName
optional? Or we will always generate the bucket name? - In general, nouns are preferred over verbs in the crd. What about calling this
bucketNamePrefix
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is generateBucketName optional? Or we will always generate the bucket name?
We will always generate the name to prevent collisions in the object store. The name itself will be stored in key:values or env vars so it will be abstracted from the user somewhat.
In general, nouns are preferred over verbs in the crd. What about calling this bucketNamePrefix?
I'm okay with that, will change.
kind: Secret | ||
metadata: | ||
name: my-s3-key-pair | ||
namespace: dev-user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the secret also add the ownerReference
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. It wouldn't be a child of the operator, claim, or configmap. Since this is a generalized design, the secret may be user or operator defined, so it's dependencies aren't deterministic from our point of view.
|
||
## Design | ||
|
||
### Work Flow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a section in the workflow for the admin creating the storage class that exposes the object store for bucket creation? This will help illustrate that until the admin 1) creates the storage class and 2) starts the operator that responds to the storage class, no buckets can be provisioned. This will also help explain why it is a general solution and not specific to a provider such as ceph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 will do
Signed-off-by: Jon Cope <[email protected]>
|
||
## Proposed Feature | ||
|
||
**A generalized method of S3 bucket provisioning through the implementation of an `ObjectBucketClaim` CRD and controller.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this proposal work for Minio and for EdgeFS buckets as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, yes :) The controller will import the aws go sdk so stores that present an S3 interface should be compatible.
|
||
![work flow](./ObjectBucketClaim.png) | ||
|
||
1. The Rook Operator detects a new `ObjectBucketClaim` instance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which operator is watching for new ObjectBucketClaim
instances to be created? Does that operator/controller have the general responsibility of provisioning buckets for any s3 compatible object store? or will ceph, minio, edgeFS all have their own running and watching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which operator is watching for new ObjectBucketClaim instances to be created?
The claim controller will run inside it's own operator the same as the other rook operators (ceph, nfs, edgefs, etc).
Does that operator/controller have the general responsibility of provisioning buckets for any s3 compatible object store?
yes. the operator should be agnostic to the store provider. the only requirement is that the provider expose an S3 interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, wouldn't each provisioner only understand one particular object store type (ceph in rook's case, at least initially)? If we want to provision minio or s3 buckets we'd set the StorageClass provisioner set to some other provisioner (rook minio, or something that does s3, or whatever).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't each provisioner only understand one particular object store type (ceph in rook's case, at least initially)?
As far as the object store, yes since each store provider has its own requirements for provisioning (CLI vs API calls, etc). However, buckets creation is abstracted thru the S3 interface. A single bucket operator can remain unaware of the underlying store type so long as they expose and S3 endpoint.
name: some-object-store | ||
provisioner: rook.io/object-bucket-claim-controller | ||
parameters: | ||
objectStoreService: my-store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this StorageClass could be used by a general bucket provisioner to determine the s3 service endpoint in a general way, these values look specific to the RGW object store config information. How would this work for Minio or EdgeFS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In EdgeFS serviceName "my-store" will be exposed as generic Kubernetes service. Having service name defined would help EdgeFS operator (or maybe even be done in a generic way also) to determine URL endpoint to call. The bucket then can be created via general API call which I hope can be part of Rook framework itself. Essentially API has to mimic "curl http://URL:PORT/bucketName -X PUT" semantic, handle authorization and errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm misunderstanding the question. Looking at minio and edgefs: they both generate Service API objects. Referencing the Service in the StorageClass enables the operator to Get()
the Service, from which it will get the IP and Port that the S3 app is listening on. In this way, admins can "publish" one or more object stores and the user is not required to know anything about them.
|
||
Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket. | ||
|
||
One ConfigMap will be created per `ObjectBucketClaim` instance. ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name. This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`. Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the deterministic rules for determining the configMap name defined in this proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will add pattern for naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling we over designing this a bit. Pretty much all object storage systems have to handle dynamic bucket creation via AWS S3 API call. If they don't - then it means it is incompatible, and we should not bother. Rook framework would need to provide v2 and v4 authentication support, allow a user to select that and initiate the HTTP(S) call using CRD provided authorization credentials. Why do we need to introduce a storage class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true that objects storage systems should generally provide a standard API for bucket creation, but that's not the case for user creation (both radosgw and minio do it via a CLI, for example), and the provisioner needs to do both users and buckets.
The purpose of the StorageClass is to describe which object store to provision against. It might be ceph cluster foo, AWS S3 region us-east, whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The storage class contains a provisioner
property which can be used by the bucket controller to decide how to create the bucket Error on my part: the provisioner field is essentially ignored. It could be checked by the controller to make sure it's the expected value, but its value is just a hard-coded string that implies bucket provisioning. The CRD operator is just watching for bucket CRD events and any event on this type of CRD is s3-compatible bucket related.
The parameters
property names the object store service and namespace, which relieves the app developer from these concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a standard API for bucket creation, but that's not the case for user creation (both radosgw
and minio do it via a CLI, for example), and the provisioner needs to do both users and buckets.
@liewegas Is the implication that bucket creation would be implemented redundantly in each rook object provisioner? It's clear why we need to do this for users however bucket creation could be handled by a single operator with a AWS S3 client.
Signed-off-by: Jon Cope <[email protected]>
Signed-off-by: Jon Cope <[email protected]>
There is a way to provision via CLI and management APIs. But should we go
this route?
In the case of AWS S3 API, user name itself is pretty much encoded into the
v2/v4 authorization. Why would we mix user provisioning into dynamic bucket
creation? For instance, in the case of multi-tenant storage (e.g. EdgeFS),
in addition to region namespace, we would need to specify tenant name, type
of user handling database (built-in, LDAP or AD), permission. Isn't it
would be better to isolate user provisioning from bucket creation to keep
things a bit simpler?
If we stick to AWS S3 compatible API, then we do not need Storage Class
indirection IMHO.
Having said that, I do see benefits of doing it in provider dependent ways,
like for instance in EdgeFS operator would be able to override Tenant's
properties such as Erasure Encoding schemas to use, Replication Count,
Encryption schema etc. If we want to get to that level of simplification -
then maybe.
I'd suggest we should stick to generic Rook provisioning CRD, which would
support all the S3 compatible stores via AWS S3 API. And make this proposal
be optional, or perhaps make it operator specific, to begin with.
…On Fri, Jan 18, 2019 at 3:03 PM Sage Weil ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In design/object-bucket/object-bucket.md
<#2483 (comment)>:
> @@ -0,0 +1,137 @@
+# ObjectBucketClaim CRD and Controller
+
+## Proposed Feature
+
+**A generalized method of S3 bucket provisioning through the implementation of an `ObjectBucketClaim` CRD and controller.**
+
+Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket.
+
+One ConfigMap will be created per `ObjectBucketClaim` instance. ConfigMap names will be deterministic and derived from the `ObjectBucketClaim` name. This gives users the opportunity to define their Pod spec at the same time as the `ObjectBucketClaim`. Additionally, the `ObjectBucketClaim` and Pod can be deployed at the same time thanks in-built synchronization in Kubernetes.
That's true that objects storage systems should generally provide a
standard API for bucket creation, but that's not the case for user creation
(both radosgw and minio do it via a CLI, for example), and the provisioner
needs to do both users and buckets.
The purpose of the StorageClass is to describe *which* object store to
provision against. It might be ceph cluster foo, AWS S3 region us-east,
whatever.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2483 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABNTV_9vE9pQ7KDT0_o4uKL2zhFA3D_aks5vElKlgaJpZM4Z8GxU>
.
|
This bucket provisioning proposal does not depend on a user. It just needs a secret (creds) and an object store service. |
This is the core design point of this proposal. By leveraging the S3 API, we can generalized bucket provisioning in a single operator.
This seems to reverse the above quote. Bucket creation via the S3 API is the core concept of this proposal. That said, I'm only familiar with ceph and minio and not edgeFS. Does it provide bucket provisioning operations outside of the S3 API? |
Signed-off-by: Jon Cope <[email protected]>
@@ -45,6 +43,7 @@ The admin ... | |||
|
|||
- creates the ObjectBucketClaim Operator (a Deployment API object) | |||
- creates a StorageClass with `parameters[serviceName]=objectstore-service` and `parameters[serviceNamespace]=some-namespace` | |||
- `provisioner` field is left blank. Since only the ObjectBucketClaim operator is watching ObjectBucketClaim instances, defining the operator is unnecessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...defining the operator/provisioner is unnecessary, and If a value is provided it is silently ignored."
Agree?
parameters: | ||
serviceName: my-store | ||
serviceNamespace: some-namespace | ||
``` | ||
|
||
1. Since ObjectBucketClaim instances are only watched by the ObjectBucketClaim operator, defining the provisioner is unnecessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like
|
||
**A generalized method of S3 bucket provisioning through the implementation of an ObjectBucketClaim CRD and controller.** | ||
|
||
Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for that bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: keep all singular or all plural. Eg.
Users will request a bucket via the Kubernetes API and have returned a ConfigMap with connection information for that bucket , or
Users will request buckets via the Kubernetes API and have returned ConfigMaps with connection information for each bucket
|
||
One ConfigMap will be created per ObjectBucketClaim instance. ConfigMap names will be semi-deterministic and derived from the ObjectBucketClaim name. If an ObjectBucketClaim is created with the `generateName` field, then the name cannot be known ahead of time. | ||
|
||
Additionally, the ObjectBucketClaim and Pod can be deployed at the same time thanks to in-built synchronization in Kubernetes. Pods mounting the configMap as either `envFrom:` or `env[0].valueFrom.ConfigMapKeyRef` will have a status of `CreateContainerConfigError` until the configMap is created. Pods mounting as a volume will have a status of `ContainerCreating` until the configMap is created. In both cases, the Pod will start once the configMap exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Pods mounting as a volume will have ... --> "Pods mounting a configMap as a volume will have"
#### Assumptions | ||
|
||
- An object store has been provisioned with a reachable endpoint. ([e.g. the Rook-Ceph Object provisioner](https://rook.github.io/docs/rook/master/ceph-object.html)) This may also be an external endpoint such as AWS S3. | ||
- A Service and/or Endpoint API object exists to route connections to the object store. Typically, object store operators generate these. In cases where the is external to the cluster, they must be configured manually. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: A Service and/or Endpoint API object exists I'd be explicit and say "A Kubernetes Service and/or Endpoint API object exists"
|
||
**Use Case: Expose an Object Store Endpoint** | ||
|
||
_As a Kubernetes admin, I want to expose an object store to cluster users for bucket provisioning via the Kubernetes API._ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a difference between this use-case and the one above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
editing mistake, will fix
rook.io/bucket-provisioner: | ||
rook.io/object-bucket-claim: my-bucket-1 [1] | ||
spec: | ||
storageClassName: some-object-store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see value in supporting both storageClass
and store
? See 2482 comment regarding providing both. A kick-the-tires use case might be better served if the object store service name can be supplied directly without reliance on a storage admin to create a storage class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point. Instead of store:
though, it should either be a store url or a kubernetes service name. I lean towards URL.
namespace: dev-user | ||
labels: | ||
rook.io/bucket-provisioner: | ||
rook.io/object-bucket-claim: my-bucket-1 [1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think the operator should also add an anno containing the configMap name? It can be easily derived so not essential...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a reasonable thought. Another way to get the associated CM would be by rook.io/object-bucket-claim: my-bucket-1
label selector. It's slightly more obtuse - what do you think?
```yaml | ||
apiVersion: rook.io/v1alpha2 | ||
kind: ObjectBucketClaim | ||
metadata: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be an ownerReferences
value pointing to the ceph cluster here? IOW, when the cluster is deleted this CRD would also be deleted. Or is it better long term to not tie this CRD to the rook-ceph cluster?
Signed-off-by: Jon Cope <[email protected]>
It does. This proposal makes sense indeed. How are we going to deal with region/namespace and tenantName? Will we pass it down to storage provider level? |
IMO region/namespace would fit best in the StorageClass as an optional parameter. This puts the onus on the cluster admin to know if the object provider they are using supports/expects regions. For Through RBAC, cluster admins can define access to each region per namespace by limiting access to a subset of storageClasses. This is accomplished with the
After some cursory reading, my understanding is that tenant names are optional in aws s3, mino, ceph, and edgefs. If left empty, an empty string is applied as the tenant name. I think we have a few options in how we could address the feature:
I'm undecided on which would be the most reasonable choice (or if there are others I haven't thought of). @dyusupov @liewegas @travisn @jeffvance thoughts? |
This works for edgefs. The difference I guess from other providers is that EdgeFS needs namespace (btw, can be any and not necessarily limited by AWS dictations) as well as tenantName in terms of to build universally unique object path, e.g. NAMESPACE/TENANT/BUCKET/OBJECT. It is called Name Hash ID and used for cross-sites lookups, metadata-only transfers, data I/O mesh policy, etc.
I would do (1) and (2). We could defer (1) to a separate pull request while having (2) be predefined as a part of this pull request. |
I think this seems reasonable. Implementing (2) early on would lay a framework for handling defined tenants. (1) would follow up with a change in how the information is delivered to that framework. I'll add these changes to the doc |
Re tenants: I haven't thought this through, so just pitching it as a possibility. Perhaps options 0 and 2 can be combined. My thinking is that I don't want to slow down the dev of bucket provisioning and option 2 could be done later. |
|
||
A cluster user will request bucket via the Kubernetes API and have returned a ConfigMap with connection information for that bucket. | ||
|
||
One ConfigMap will be created per ObjectBucketClaim instance. ConfigMap names will be semi-deterministic and derived from the ObjectBucketClaim name. If an ObjectBucketClaim is created with the `generateName` field, then the name cannot be known ahead of time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not creating an ObjectBucket
object? Like we do with PVC and PV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The K8s Pod API expects a mountable/attachable k8s primitive (configMap, secret) for data injection, so we have to use a core API resource instead of a PV analog CRD.
namespace: dev-user | ||
data: | ||
accessKeyId: <base64 encoded string> | ||
secretAccessKey: <base64 encoded string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd advocate against the use of AWS secrets.
1st AWS is working on binding IAM to pods.
2nd in the meantime, I'm fine with scheduling the pod on nodes that have the right instance profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd advocate against the use of AWS secrets.
1st AWS is working on binding IAM to pods.
Will this work for object stores besides AWS S3 (ceph object, edgefs, minio, etc)?
2nd in the meantime, I'm fine with scheduling the pod on nodes that have the right instance profile.
Does this assume multiple cluster users would use the same instance profile? If user A and user B work within their own namespaces and require access to an object store, would both of their pods potentially use the same instance profile?
Thank you for meeting to discuss this further today. To summarize the discussion, we think that this support for generalized bucket provisioning is something that the greater Kubernetes community wants to see and has value. We also agree that being able to demonstrate something working, as opposed to just having a "grand idea", is more effective at getting community buy-in and rallying support. We think this is going in the right direction, but the concern here is that this work would be better served in a different venue, it might not be a strong fit in the scope of the Rook project. Bucket provisioning would be better suited at the workload level, supported by a scheduler, topology aware placement, dynamic/static resource matching and binding, separation of concerns (admin creds to create bucket, app creds to consume bucket, user management), etc. Essentially more machinery around the consuming workload. A design that has the abstraction closer to the app level and scheduling/provisioning is a stronger approach. That would be similar to the PVC/PV model where the claim abstraction is closer to the workload and scheduler. Rook should be focusing on orchestrating and integrating concrete implementations of storage. Rook should be bringing new providers of storage into this ecosystem that implement these abstractions, but Rook shouldn't go beyond the storage (e.g., into scheduling and workload machinery). This would likely be a better for the following venues: We are very open to collaborating and incorporating this effort into the Crossplane project. You can read more in the Crossplane vision and architecture doc. We need to figure out what to split out and what to keep from the design in this PR. I think the generalized bucket abstraction and a bucket controller make sense split out while a concrete Ceph bucket CRD makes more sense here in Rook. We can discuss further and once again thank you for your efforts and driving this initiative. |
@jbw976 @bassam @liewegas @travisn @eboyd @jeffvance Thank you for sharing your time with us; we took a lot away from the meeting and it's given us a better since of the direction this project should head. We firmly agree that a generalized implementation of a bucket provisioner would be better implemented somewhere other than Rook. With that in mind, I'm going to go ahead and close this PR so that we can start a fresh dialogue within the scope of a concrete Ceph CRD design. The benefit being that the conversation won't contain the history of drastic changes from specific to generalized and back to specific design scope. I'll reference this PR in the next so that we can still keep track of the overall history of the proposal. |
**Description of your
Design documentation for the addition of an ObjectBucketClaim CRD and controller for the provisioning of S3 buckets. This design should be agnostic to the underlying provider.
Which issue is resolved by this Pull Request:
Resolves #2480
Checklist:
make codegen
) has been run to update object specifications, if necessary.[skip ci]