From bdc1a352f6ec049ba9007d64cbde060d8f8ce4fe Mon Sep 17 00:00:00 2001 From: Srishti Thakkar Date: Thu, 1 Oct 2020 20:54:37 +0530 Subject: [PATCH] Updating documentation (#77) * Updating documentation Signed-off-by: SrishT * Updating documentation and adding version validation Signed-off-by: SrishT * Webhook changes Signed-off-by: SrishT * Addressing review comments Signed-off-by: SrishT * Addressing review comments Signed-off-by: SrishT * Changing default storage class Signed-off-by: SrishT Co-authored-by: SrishT --- README.md | 44 +++++---------- charts/bookkeeper-operator/README.md | 32 ++++++----- .../templates/webhook.yaml | 5 +- charts/bookkeeper/README.md | 41 ++++++++++---- charts/bookkeeper/templates/bookkeeper.yaml | 33 +++++------ charts/bookkeeper/values.yaml | 32 +++++------ deploy/webhook.yaml | 1 + doc/manual-installation.md | 6 +- doc/operator-upgrade.md | 29 +++++----- doc/rbac.md | 2 +- doc/release_process.md | 56 +++++++++---------- doc/rollback-cluster.md | 20 +++---- doc/upgrade-cluster.md | 23 ++++---- doc/webhook.md | 34 +++++------ tools/manifest_files/webhook.yaml | 5 +- tools/operatorUpgrade.sh | 2 +- 16 files changed, 188 insertions(+), 177 deletions(-) diff --git a/README.md b/README.md index 722e57f1..9e5b047b 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,7 @@ The project is currently alpha. While no breaking API changes are currently plan ## Overview -[Bookkeeper](https://bookkeeper.apache.org/) A scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads. +[Bookkeeper](https://bookkeeper.apache.org/) is a scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads. The Bookkeeper Operator manages Bookkeeper clusters deployed to Kubernetes and automates tasks related to operating a Bookkeeper cluster. @@ -42,40 +42,26 @@ The Bookkeeper Operator manages Bookkeeper clusters deployed to Kubernetes and a ## Quickstart +We recommend using our [helm charts](charts) for all installation and upgrades (but not for rollbacks at the moment since helm rollbacks are still experimental). The helm charts for bookkeeper operator (version 0.1.2 onwards) and bookkeeper cluster (version 0.5.0 onwards) are published in [https://charts.pravega.io](https://charts.pravega.io/). To add this repository to your Helm repos, use the following command: +``` +helm repo add pravega https://charts.pravega.io +``` +There are manual deployment, upgrade and rollback options available as well. + ### Install the Operator > Note: If you are running on Google Kubernetes Engine (GKE), please [check this first](doc/development.md#installation-on-google-kubernetes-engine). -We recommend using helm to deploy a Bookkeeper Operator. Check out the [helm installation](charts/bookkeeper-operator/README.md) document for instructions. +To understand how to deploy a Bookkeeper Operator using helm, refer to [this](charts/bookkeeper-operator#installing-the-chart). + #### Install the Operator in Test Mode The Operator can be run in `test mode` if we want to deploy the Bookkeeper Cluster on minikube or on a cluster with very limited resources by setting `testmode: true` in `values.yaml` file. Operator running in test mode skips the minimum replica requirement checks. Test mode provides a bare minimum setup and is not recommended to be used in production environments. ### Install a sample Bookkeeper cluster -> Note that the Bookkeeper cluster must be installed in the same namespace as the Zookeeper cluster. - -If the Bookkeeper cluster is expected to work with Pravega, we need to create a ConfigMap which needs to have the following values - -| KEY | VALUE | -|---|---| -| *PRAVEGA_CLUSTER_NAME* | Name of Pravega Cluster using this BookKeeper Cluster | -| *WAIT_FOR* | Zookeeper URL | - -The name of this ConfigMap needs to be mentioned in the field `envVars` present in the BookKeeper Spec. For more details about this ConfigMap refer to [this](doc/bookkeeper-options.md#bookkeeper-custom-configuration). - -Helm can be used to install a sample Bookkeeper cluster with the release name `bookkeeper`. - -``` -$ helm install bookkeeper charts/bookkeeper --set zookeeperUri=[ZOOKEEPER_HOST] --set pravegaClusterName=[CLUSTER_NAME] -``` - -where: - -- **[ZOOKEEPER_HOST]** is the Zookeeper service endpoint of your Zookeeper deployment (e.g. `zookeeper-client:2181`). It expects the zookeeper service URL in the given format `:` -- **[CLUSTER_NAME]** is the name of the Pravega cluster (i.e. this field is optional and needs to be provided only if we expect this bookkeeper cluster to work with [Pravega](https://github.com/pravega/pravega)). -Check out the [Bookkeeper Helm Chart](charts/bookkeeper) for more a complete list of installation parameters. +To understand how to deploy a bookkeeper cluster using helm, refer to [this](charts/bookkeeper#installing-the-chart). -Verify that the cluster instances and its components are being created. +Once the bookkeeper cluster with release name `bookkeeper` has been created, use the following command to verify that the cluster instances and its components are being created. ``` $ kubectl get bk @@ -131,9 +117,8 @@ For upgrading the bookkeeper operator check the document [operator-upgrade](doc/ ### Uninstall the Bookkeeper cluster ``` -$ helm uninstall bookkeeper +$ helm uninstall [BOOKKEEPER_RELEASE_NAME] ``` -Here `bookkeeper` is the bookkeeper cluster release name. Once the Bookkeeper cluster has been deleted, make sure to check that the zookeeper metadata has been cleaned up before proceeding with the deletion of the operator. This can be confirmed with the presence of the following log message in the operator logs. ``` @@ -142,7 +127,7 @@ zookeeper metadata deleted However, if the operator fails to delete this metadata from zookeeper, you will instead find the following log message in the operator logs. ``` -failed to cleanup metadata from zookeeper (znode path: /pravega/): +failed to cleanup [CLUSTER_NAME] metadata from zookeeper (znode path: /pravega/[PRAVEGA_CLUSTER_NAME]): ``` The operator additionally sends out a `ZKMETA_CLEANUP_ERROR` event to notify the user about this failure. The user can check this event by doing `kubectl get events`. The following is the sample describe output of the event that is generated by the operator in such a case @@ -183,9 +168,8 @@ Events: > Note that the Bookkeeper clusters managed by the Bookkeeper operator will NOT be deleted even if the operator is uninstalled. ``` -$ helm uninstall bookkeeper-operator +$ helm uninstall [BOOKKEEPER_OPERATOR_RELEASE_NAME] ``` -Here `bookkeeper-operator` is the operator release name. ### Manual installation diff --git a/charts/bookkeeper-operator/README.md b/charts/bookkeeper-operator/README.md index 29bd9699..80db0614 100644 --- a/charts/bookkeeper-operator/README.md +++ b/charts/bookkeeper-operator/README.md @@ -11,37 +11,41 @@ This chart bootstraps a [Bookkeeper Operator](https://github.com/pravega/bookkee - Helm 3.2.1+ - An existing Apache Zookeeper 3.6.1 cluster. This can be easily deployed using our [Zookeeper Operator](https://github.com/pravega/zookeeper-operator) - Cert-Manager v0.15.0+ or some other certificate management solution in order to manage the webhook service certificates. This can be easily deployed by referring to [this](https://cert-manager.io/docs/installation/kubernetes/) -- An Issuer and a Certificate (either self-signed or CA signed) in the same namespace that the Bookkeeper Operator will be installed (refer to [this](https://github.com/pravega/bookkeeper-operator/blob/master/deploy/certificate.yaml) manifest to create a self-signed certificate in the default namespace) -> The name of the certificate (*webhookCert.certName*), the name of the secret created by this certificate (*webhookCert.secretName*), the tls.crt (*webhookCert.crt*) and tls.key (*webhookCert.key*) need to be specified against the corresponding fields in the values.yaml file, or can be provided with the install command as shown [here](#installing-the-chart). -The values *tls.crt* and *tls.key* are contained in the secret which is created by the certificate and can be obtained using the following command -``` -kubectl get secret -o yaml | grep tls. -``` - + - An Issuer and a Certificate (either self-signed or CA signed) in the same namespace that the Bookkeeper Operator will be installed (refer to [this](https://github.com/pravega/bookkeeper-operator/blob/master/deploy/certificate.yaml) manifest to create a self-signed certificate in the default namespace) ## Installing the Chart -To install the chart with the release name `my-release`: +To install the bookkeeper-operator chart, use the following commands: ``` -$ helm install my-release bookkeeper-operator --set webhookCert.generate=false --set webhookCert.certName= --set webhookCert.secretName= +$ helm repo add pravega https://charts.pravega.io +$ helm repo update +$ helm install [RELEASE_NAME] pravega/bookkeeper-operator --version=[VERSION] --set webhookCert.certName=[CERT_NAME] --set webhookCert.secretName=[SECRET_NAME] ``` +where: +- **[RELEASE_NAME]** is the release name for the bookkeeper-operator chart +- **[DEPLOYMENT_NAME]** is the name of the bookkeeper-operator deployment so created. (If [RELEASE_NAME] contains the string `bookkeeper-operator`, `[DEPLOYMENT_NAME] = [RELEASE_NAME]`, else `[DEPLOYMENT_NAME] = [RELEASE_NAME]-bookkeeper-operator`. The [DEPLOYMENT_NAME] can however be overridden by providing `--set fullnameOverride=[DEPLOYMENT_NAME]` along with the helm install command) +- **[VERSION]** can be any stable release version for bookkeeper-operator from 0.1.3 onwards +- **[CERT_NAME]** is the name of the certificate created as a prerequisite +- **[SECRET_NAME]** is the name of the secret created by the above certificate + +This command deploys a bookkeeper-operator on the Kubernetes cluster in its default configuration. The [configuration](#configuration) section lists the parameters that can be configured during installation. -The command deploys bookkeeper operator on the Kubernetes cluster in the default configuration. The [configuration](#configuration) section lists the parameters that can be configured during installation. +>Note: If the bookkeeper-operator version is 0.1.2, webhookCert.certName and webhookCert.secretName should not be set. Also in this case, cert-manager and the certificate/issuer do not need to be deployed as prerequisites. ## Uninstalling the Chart -To uninstall/delete the `my-release` deployment: +To uninstall/delete the bookkeeper-operator chart, use the following command: ``` -$ helm uninstall my-release +$ helm uninstall [RELEASE_NAME] ``` -The command removes all the Kubernetes components associated with the chart and deletes the release. +This command removes all the Kubernetes components associated with the chart and deletes the release. ## Configuration -The following table lists the configurable parameters of the Bookkeeper operator chart and their default values. +The following table lists the configurable parameters of the bookkeeper-operator chart and their default values. | Parameter | Description | Default | | ----- | ----------- | ------ | diff --git a/charts/bookkeeper-operator/templates/webhook.yaml b/charts/bookkeeper-operator/templates/webhook.yaml index 2a8ce8ed..9c3d4825 100644 --- a/charts/bookkeeper-operator/templates/webhook.yaml +++ b/charts/bookkeeper-operator/templates/webhook.yaml @@ -23,14 +23,15 @@ metadata: labels: {{ include "bookkeeper-operator.commonLabels" . | indent 4 }} annotations: - cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ .Values.certName }} + cert-manager.io/inject-ca-from: {{ .Release.Namespace }}/{{ .Values.webhookCert.certName }} webhooks: - clientConfig: service: name: bookkeeper-webhook-svc namespace: {{ .Release.Namespace }} - path: /validate-bookkeeper-pravega-io-v1beta1-bookkeepercluster + path: /validate-bookkeeper-pravega-io-v1alpha1-bookkeepercluster name: bookkeeperwebhook.pravega.io + failurePolicy: Fail rules: - apiGroups: - bookkeeper.pravega.io diff --git a/charts/bookkeeper/README.md b/charts/bookkeeper/README.md index 9ec43ef9..af3b5e28 100644 --- a/charts/bookkeeper/README.md +++ b/charts/bookkeeper/README.md @@ -15,24 +15,34 @@ This chart creates a Bookkeeper cluster in [Kubernetes](http://kubernetes.io) us ## Installing the Chart -To install the chart with the release name `my-release`: +To install the bookkeeper chart, use the following commands: ``` -$ helm install my-release bookkeeper +$ helm repo add pravega https://charts.pravega.io +$ helm repo update +$ helm install [RELEASE_NAME] pravega/bookkeeper --version=[VERSION] --set zookeeperUri=[ZOOKEEPER_HOST] --set pravegaClusterName=[PRAVEGA_CLUSTER_NAME] -n [NAMESPACE] ``` +where: +- **[RELEASE_NAME]** is the release name for the bookkeeper chart +- **[CLUSTER_NAME]** is the name of the bookkeeper cluster so created (if [RELEASE_NAME] contains the string `bookkeeper`, `[CLUSTER_NAME] = [RELEASE_NAME]`, else `[CLUSTER_NAME] = [RELEASE_NAME]-bookkeeper`. The [CLUSTER_NAME] can however be overridden by providing `--set fullnameOverride=[CLUSTER_NAME]` along with the helm install command) +- **[PRAVEGA_CLUSTER_NAME]** is the name of the pravega cluster (this field is optional and needs to be provided only if we expect the bookkeeper cluster to work with [Pravega](https://github.com/pravega/pravega) and if we wish to override its default value which is `pravega`) +- **[VERSION]** can be any stable release version for bookkeeper from 0.5.0 onwards +- **[ZOOKEEPER_HOST]** is the zookeeper service endpoint of your zookeeper cluster deployment (default value of this field is `zookeeper-client:2181`) +- **[NAMESPACE]** is the namespace in which you wish to deploy the bookkeeper cluster (default value for this field is `default`) The bookkeeper cluster must be installed in the same namespace as the zookeeper cluster. -The command deploys bookkeeper on the Kubernetes cluster in the default configuration. The [configuration](#configuration) section lists the parameters that can be configured during installation. +This command deploys bookkeeper on the Kubernetes cluster in its default configuration. The [configuration](#configuration) section lists the parameters that can be configured during installation. ## Uninstalling the Chart -To uninstall/delete the `my-release` deployment: +To uninstall/delete the bookkeeper chart, use the following command: ``` -$ helm uninstall my-release +$ helm uninstall [RELEASE_NAME] ``` -The command removes all the Kubernetes components associated with the chart and deletes the release. -> Note: If you are setting blockOwnerDeletion to false during installtion, PVC's won't be removed automatically while uninstalling bookkeepercluster. PVCs have to be deleted manually. +This command removes all the Kubernetes components associated with the chart and deletes the release. +> Note: If blockOwnerDeletion had been set to false during bookkeeper installation, the PVCs won't be removed automatically while uninstalling the bookkeeper chart, and would need to be deleted manually. + ## Configuration The following table lists the configurable parameters of the Bookkeeper chart and their default values. @@ -47,16 +57,25 @@ The following table lists the configurable parameters of the Bookkeeper chart an | `pravegaClusterName` | Name of the pravega cluster | `pravega` | | `autoRecovery`| Enable bookkeeper auto-recovery | `true` | | `blockOwnerDeletion`| Enable blockOwnerDeletion | `true` | -| `probes` | Timeout configuration of the readiness and liveness probes for the bookkeeper pods | `{}` | +| `probes.readiness.initialDelaySeconds` | Number of seconds after the container has started before readiness probe is initiated | `20` | +| `probes.readiness.periodSeconds` | Number of seconds in which readiness probe will be performed | `10` | +| `probes.readiness.failureThreshold` | Number of seconds after which the readiness probe times out | `9` | +| `probes.readiness.successThreshold` | Minimum number of consecutive successes for the readiness probe to be considered successful after having failed | `1` | +| `probes.readiness.timeoutSeconds` | Number of times Kubernetes will retry after a readiness probe failure before restarting the container | `5` | +| `probes.liveness.initialDelaySeconds` | Number of seconds after the container has started before liveness probe is initiated | `60` | +| `probes.liveness.periodSeconds` | Number of seconds in which liveness probe will be performed | `15` | +| `probes.liveness.failureThreshold` | Number of seconds after which the liveness probe times out | `4` | +| `probes.liveness.successThreshold` | Minimum number of consecutive successes for the liveness probe to be considered successful after having failed | `1` | +| `probes.liveness.timeoutSeconds` | Number of times Kubernetes will retry after a liveness probe failure before restarting the container | `5` | | `resources.requests.cpu` | Requests for CPU resources | `1000m` | | `resources.requests.memory` | Requests for memory resources | `4Gi` | | `resources.limits.cpu` | Limits for CPU resources | `2000m` | | `resources.limits.memory` | Limits for memory resources | `4Gi` | -| `storage.ledger.className` | Storage class name for bookkeeper ledgers | `standard` | +| `storage.ledger.className` | Storage class name for bookkeeper ledgers | `` | | `storage.ledger.volumeSize` | Requested size for bookkeeper ledger persistent volumes | `10Gi` | -| `storage.journal.className` | Storage class name for bookkeeper journals | `standard` | +| `storage.journal.className` | Storage class name for bookkeeper journals | `` | | `storage.journal.volumeSize` | Requested size for bookkeeper journal persistent volumes | `10Gi` | -| `storage.index.className` | Storage class name for bookkeeper index | `standard` | +| `storage.index.className` | Storage class name for bookkeeper index | `` | | `storage.index.volumeSize` | Requested size for bookkeeper index persistent volumes | `10Gi` | | `jvmOptions.memoryOpts` | Memory Options passed to the JVM for bookkeeper performance tuning | `["-Xms1g", "-XX:MaxDirectMemorySize=2g"]` | | `jvmOptions.gcOpts` | Garbage Collector (GC) Options passed to the JVM for bookkeeper bookkeeper performance tuning | `[]` | diff --git a/charts/bookkeeper/templates/bookkeeper.yaml b/charts/bookkeeper/templates/bookkeeper.yaml index b63845ce..42539853 100644 --- a/charts/bookkeeper/templates/bookkeeper.yaml +++ b/charts/bookkeeper/templates/bookkeeper.yaml @@ -19,46 +19,47 @@ spec: probes: {{- if .Values.probes.readiness }} readinessProbe: - initialDelaySeconds: {{ .Values.probes.readiness.initialDelaySeconds }} - periodSeconds: {{ .Values.probes.readiness.periodSeconds }} - failureThreshold: {{ .Values.probes.readiness.failureThreshold }} - successThreshold: {{ .Values.probes.readiness.successThreshold }} - timeoutSeconds: {{ .Values.probes.readiness.timeoutSeconds }} + initialDelaySeconds: {{ .Values.probes.readiness.initialDelaySeconds | default 20 }} + periodSeconds: {{ .Values.probes.readiness.periodSeconds | default 10 }} + failureThreshold: {{ .Values.probes.readiness.failureThreshold | default 9 }} + successThreshold: {{ .Values.probes.readiness.successThreshold | default 1 }} + timeoutSeconds: {{ .Values.probes.readiness.timeoutSeconds | default 5 }} {{- end }} {{- if .Values.probes.liveness }} livenessProbe: - initialDelaySeconds: {{ .Values.probes.liveness.initialDelaySeconds }} - periodSeconds: {{ .Values.probes.liveness.periodSeconds }} - failureThreshold: {{ .Values.probes.liveness.failureThreshold }} - successThreshold: {{ .Values.probes.liveness.successThreshold }} - timeoutSeconds: {{ .Values.probes.liveness.timeoutSeconds }} + initialDelaySeconds: {{ .Values.probes.liveness.initialDelaySeconds | default 60 }} + periodSeconds: {{ .Values.probes.liveness.periodSeconds | default 15 }} + failureThreshold: {{ .Values.probes.liveness.failureThreshold | default 4 }} + successThreshold: {{ .Values.probes.liveness.successThreshold | default 1 }} + timeoutSeconds: {{ .Values.probes.liveness.timeoutSeconds | default 5 }} {{- end }} {{- end }} {{- if .Values.resources }} resources: - requests: - cpu: {{ .Values.resources.requests.cpu | quote }} - memory: {{ .Values.resources.requests.memory | quote }} - limits: - cpu: {{ .Values.resources.limits.cpu | quote }} - memory: {{ .Values.resources.limits.memory | quote }} +{{ toYaml .Values.resources | indent 6 }} {{- end }} storage: ledgerVolumeClaimTemplate: accessModes: [ "ReadWriteOnce" ] + {{- if .Values.storage.ledger.className }} storageClassName: {{ .Values.storage.ledger.className }} + {{- end }} resources: requests: storage: {{ .Values.storage.ledger.volumeSize }} journalVolumeClaimTemplate: accessModes: [ "ReadWriteOnce" ] + {{- if .Values.storage.journal.className }} storageClassName: {{ .Values.storage.journal.className }} + {{- end }} resources: requests: storage: {{ .Values.storage.journal.volumeSize }} indexVolumeClaimTemplate: accessModes: [ "ReadWriteOnce" ] + {{- if .Values.storage.index.className }} storageClassName: {{ .Values.storage.index.className }} + {{- end }} resources: requests: storage: {{ .Values.storage.index.volumeSize }} diff --git a/charts/bookkeeper/values.yaml b/charts/bookkeeper/values.yaml index 5f968ff4..57f730b0 100644 --- a/charts/bookkeeper/values.yaml +++ b/charts/bookkeeper/values.yaml @@ -20,19 +20,19 @@ pravegaClusterName: pravega autoRecovery: true blockOwnerDeletion: true -probes: {} - # readiness: - # initialDelaySeconds: 20 - # periodSeconds: 10 - # failureThreshold: 9 - # successThreshold: 1 - # timeoutSeconds: 5 - # liveness: - # initialDelaySeconds: 60 - # periodSeconds: 15 - # failureThreshold: 4 - # successThreshold: 1 - # timeoutSeconds: 5 +probes: + readiness: + initialDelaySeconds: 20 + periodSeconds: 10 + failureThreshold: 9 + successThreshold: 1 + timeoutSeconds: 5 + liveness: + initialDelaySeconds: 60 + periodSeconds: 15 + failureThreshold: 4 + successThreshold: 1 + timeoutSeconds: 5 resources: requests: @@ -44,13 +44,13 @@ resources: storage: ledger: - className: standard + className: volumeSize: 10Gi journal: - className: standard + className: volumeSize: 10Gi index: - className: standard + className: volumeSize: 10Gi jvmOptions: diff --git a/deploy/webhook.yaml b/deploy/webhook.yaml index ee0a9b36..f0361c8a 100644 --- a/deploy/webhook.yaml +++ b/deploy/webhook.yaml @@ -27,6 +27,7 @@ webhooks: namespace: default path: /validate-bookkeeper-pravega-io-v1alpha1-bookkeepercluster name: bookkeeperwebhook.pravega.io + failurePolicy: Fail rules: - apiGroups: - bookkeeper.pravega.io diff --git a/doc/manual-installation.md b/doc/manual-installation.md index 7728da10..b54d1033 100644 --- a/doc/manual-installation.md +++ b/doc/manual-installation.md @@ -67,7 +67,7 @@ containers: For more details check [this](../README.md#install-the-operator-in-test-mode) ### Install the Bookkeeper cluster manually -> Note that the Bookkeeper cluster must be installed in the same namespace as the Zookeeper cluster. +> Note: the Bookkeeper cluster must be installed in the same namespace as the Zookeeper cluster. If the BookKeeper cluster is expected to work with Pravega, we need to create a ConfigMap which needs to have the following values @@ -76,7 +76,7 @@ If the BookKeeper cluster is expected to work with Pravega, we need to create a | *PRAVEGA_CLUSTER_NAME* | Name of Pravega Cluster using this BookKeeper Cluster | | *WAIT_FOR* | Zookeeper URL | -To create this ConfigMap. +To create this ConfigMap, use the following command: ``` $ kubectl create -f deploy/config_map.yaml @@ -134,7 +134,7 @@ zookeeper metadata deleted However, if the operator fails to delete this metadata from zookeeper, you will instead find the following log message in the operator logs. ``` -failed to cleanup metadata from zookeeper (znode path: /pravega/): +failed to cleanup [CLUSTER_NAME] metadata from zookeeper (znode path: /pravega/[PRAVEGA_CLUSTER_NAME]): ``` The operator additionally sends out a `ZKMETA_CLEANUP_ERROR` event to notify the user about this failure. The user can check this event by doing `kubectl get events`. The following is the sample describe output of the event that is generated by the operator in such a case diff --git a/doc/operator-upgrade.md b/doc/operator-upgrade.md index c800f746..e5bf0c70 100644 --- a/doc/operator-upgrade.md +++ b/doc/operator-upgrade.md @@ -2,14 +2,16 @@ ## Upgrading till 0.1.2 -Bookkeeper operator can be upgraded via helm using the following command +Bookkeeper operator can be upgraded to a version **[VERSION]** via helm using the following command + ``` -$ helm upgrade bookkeeper-operator +$ helm upgrade [BOOKKEEPER_OPERATOR_RELEASE_NAME] pravega/bookkeeper-operator --version=[VERSION] ``` -Here `bookkeeper-operator` is the release name of the operator. It can also be upgraded manually by modifying the image tag using the following command +The bookkeeper operator with deployment name **[DEPLOYMENT_NAME]** can also be upgraded manually by modifying the image tag using kubectl edit, patch or apply ``` -$ kubectl edit deploy bookkeeper-operator +$ kubectl edit deploy [DEPLOYMENT_NAME] ``` + ## Upgrading to 0.1.3 ### Pre-requisites @@ -20,24 +22,25 @@ For upgrading Operator to version 0.1.3, the following must be true: 2. Cert-Manager v0.15.0+ or some other certificate management solution must be deployed for managing webhook service certificates. The upgrade trigger script assumes that the user has [cert-manager](https://cert-manager.io/docs/installation/kubernetes/) installed but any other cert management solution can also be used and script would need to be modified accordingly. To install cert-manager check [this](https://cert-manager.io/docs/installation/kubernetes/). -3. Install an Issuer and a Certificate (either self-signed or CA signed) in the same namespace as the Pravega Operator (refer to [this](https://github.com/pravega/bookkeeper-operator/blob/master/deploy/certificate.yaml) manifest to create a self-signed certificate in the default namespace). -> The name of the certificate (*webhookCert.certName*), the name of the secret created by this certificate (*webhookCert.secretName*), the tls.crt (*webhookCert.crt*) and tls.key (*webhookCert.key*) need to be specified against the corresponding fields in the values.yaml file, or can be provided with the upgrade command as shown [here](#triggering-the-upgrade). -The values *tls.crt* and *tls.key* are contained in the secret which is created by the certificate and can be obtained using the following command +3. Install an Issuer and a Certificate (either self-signed or CA signed) in the same namespace as the Bookkeeper Operator (refer to [this](https://github.com/pravega/bookkeeper-operator/blob/master/deploy/certificate.yaml) manifest to create a self-signed certificate in the default namespace). + +4. Execute the script `pre-upgrade.sh` inside the [scripts](https://github.com/pravega/bookkeeper-operator/blob/master/scripts) folder. This script patches the `bookkeeper-webhook-svc` with the required annotations and labels. The format of the command is ``` -kubectl get secret -o yaml | grep tls. +./pre-upgrade.sh [BOOKKEEPER_OPERATOR_RELEASE_NAME][BOOKKEEPER_OPERATOR_NAMESPACE] ``` -5. Execute the script `pre-upgrade.sh` inside the [scripts](https://github.com/pravega/bookkeeper-operator/blob/master/scripts) folder. This script patches the `bookkeeper-webhook-svc` with the required annotations and labels. - - ### Triggering the upgrade #### Upgrade via helm The upgrade to Operator 0.1.3 can be triggered using the following command ``` -helm upgrade --set webhookCert.generate=false --set webhookCert.certName= --set webhookCert.secretName= +helm upgrade [BOOKKEEPER_OPERATOR_RELEASE_NAME] pravega/bookkeeper-operator --version=0.1.3 --set webhookCert.certName=[CERT_NAME] --set webhookCert.secretName=[SECRET_NAME] ``` +where: +- `[CERT_NAME]` is the name of the certificate that has been created +- `[SECRET_NAME]` is the name of the secret created by the above certificate + #### Upgrade manually -To manually trigger the upgrade to Operator 0.1.3, run the script `operatorUpgrade.sh` under [tools](https://github.com/pravega/bookkeeper-operator/blob/master/tools) folder. This script installs certificate, patches and creates necessary K8s artifacts, needed by 0.1.3 Operator, prior to triggering the upgrade by updating the image tag in Operator deployment. +To manually trigger the upgrade to Operator 0.1.3, run the script `operatorUpgrade.sh` under [tools](https://github.com/pravega/bookkeeper-operator/blob/master/tools) folder. This script installs the certificate, patches and creates necessary K8s artifacts, needed by 0.1.3 Operator, prior to triggering the upgrade by updating the image tag in Operator deployment. diff --git a/doc/rbac.md b/doc/rbac.md index 658282ae..77ebe83f 100644 --- a/doc/rbac.md +++ b/doc/rbac.md @@ -4,7 +4,7 @@ You can optionally configure non-default service accounts for the Bookkeeper. -For BookKeeper, set the `serviceAccountName` field under the `spec` block. +For Bookkeeper, set the `serviceAccountName` field under the `spec` block. ``` ... diff --git a/doc/release_process.md b/doc/release_process.md index 80aeb64c..ec5a39e3 100644 --- a/doc/release_process.md +++ b/doc/release_process.md @@ -4,7 +4,7 @@ Pravega Operator follows the [Semantic Versioning](https://semver.org/) model for numbering releases. ## Introduction -This page documents the tagging, branching and release process followed for Pravega Operator. +This page documents the tagging, branching and release process followed for Bookkeeper Operator. ## Types of Releases @@ -13,54 +13,52 @@ This page documents the tagging, branching and release process followed for Prav This is a minor release with backward compatible changes and bug fixes. 1. Create a new branch with last number bumped up from the existing release branch. - For example, if the existing release branch is 0.3.2, the new branch will be named 0.3.3. - - `$ git clone --branch git@github.com:pravega/pravega-operator.git ` - + For example, if the existing release branch is 0.1.2, the new branch will be named 0.1.2. + + `$ git clone --branch git@github.com:pravega/bookkeeper-operator.git ` + `$ git checkout -b ` - + 2. Cherry pick commits from master/private branches into the release branch. Change operator version in Version.go - - `$ git cherry-pick --signoff ` - -3. Make sure all unit and end to end tests pass successfully. + + `$ git cherry-pick --signoff ` + +3. Make sure all unit and end to end tests pass successfully. `$ make test` - + 4. Push changes to the newly created release branch. `$ git push origin ` - -5. Create a new release candidate tag on this branch. - Tag name should correspond to release-branch-name-. - For example: `0.3.3-rc1` for the first release candidate. - + +5. Create a new release candidate tag on this branch. + Tag name should correspond to release-branch-name-. + For example: `0.1.2-rc1` for the first release candidate. + `$ git tag -a -m ""` - + `$ git push origin ` - - It is possible that a release candidate is problematic and we need to do a new release candidate. In this case, we need to repeat this tagging step as many times as needed. - + + It is possible that a release candidate is problematic and we need to do a new release candidate. In this case, we need to repeat this tagging step as many times as needed. + 6. Push docker image for release to docker hub pravega repo: `$ make build-image` - - `$ docker tag pravega/pravega-operator:latest pravega/pravega-operator:` - - `$ docker push pravega/pravega-operator:` - -7. Once a release candidate is tested and there are no more changes needed, push a final release tag and image (like `0.3.3`) + + `$ docker tag pravega/bookkeeper-operator:latest pravega/bookkeeper-operator:` + + `$ docker push pravega/bookkeeper-operator:` + +7. Once a release candidate is tested and there are no more changes needed, push a final release tag and image (like `0.1.2`) 8. Release Notes ### Major Release (Feature + bugfixes) -This has non backward compatible changes. +This has non backward compatible changes. Here, we bump up the middle or most significant digit from earlier release. Follow same steps as minor release. ## Reference https://github.com/pravega/pravega/wiki/How-to-release - - diff --git a/doc/rollback-cluster.md b/doc/rollback-cluster.md index d52ff4eb..0a0df1a6 100644 --- a/doc/rollback-cluster.md +++ b/doc/rollback-cluster.md @@ -1,7 +1,7 @@ # Bookkeeper Cluster Rollback -This document details how manual rollback can be triggered after a Bookkeeper cluster upgrade fails. -Note that a rollback can be triggered only on Upgrade Failure. +This document details how rollback can be triggered after a Bookkeeper cluster upgrade fails. +Note that a rollback can be triggered only after an Upgrade Failure. ## Upgrade Failure @@ -24,7 +24,7 @@ Status: True Reason: UpgradeFailed Message:
``` -After an Upgrade Failure the output of `kubectl describe bk bookkeeper` would look like this: +After an Upgrade Failure the output of `kubectl describe bk [CLUSTER_NAME]` would look like this: ``` $> kubectl describe bk bookkeeper @@ -68,18 +68,19 @@ Note: 1. A Rollback to only the last stable cluster version is supported at this point. 2. Changing the cluster spec version to the previous cluster version, when cluster is not in `UpgradeFailed` state, will not trigger a rollback. -## Rollback via Helm +## Rollback via Helm (Experimental) The following command prints the historical revisions of a particular helm release ``` -$ helm history +$ helm history [BOOKKEEPER_RELEASE_NAME] ``` Rollback can be triggered via helm using the following command ``` -$ helm rollback +$ helm rollback [BOOKKEEPER_RELEASE_NAME] [REVISION_NUMBER] --wait --timeout 600s ``` -Rollback will be successfully triggered only if the previous revision number is provided. +Rollback will be successfully triggered only if a [REVISION_NUMBER] corresponding to the last stable cluster version is provided. +>Note: Helm rollbacks are still an experimental feature and are not encouraged. We strongly recommend using manual rollbacks. ## Rollback Implementation @@ -185,7 +186,4 @@ Status: Type: RollbackInProgress ``` -When a rollback failure happens, manual intervention would be required to resolve this. -After checking and solving the root cause of failure, to bring the cluster back to a stable state, a user can upgrade to: -1. The version to which a user initially intended to upgrade.(when upgrade failure was noticed) -2. To any other supported version based versions of all pods in the cluster. +When a rollback failure happens, the operator cannot recover the cluster from this failed state and manual intervention would be required to resolve this. diff --git a/doc/upgrade-cluster.md b/doc/upgrade-cluster.md index ce5f52bd..18a07fdf 100644 --- a/doc/upgrade-cluster.md +++ b/doc/upgrade-cluster.md @@ -1,6 +1,6 @@ # Bookkeeper cluster upgrade -This document shows how to upgrade a Pravega cluster managed by the operator to a desired version while preserving the cluster's state and data whenever possible. +This document shows how to upgrade a bookkeeper cluster managed by the bookkeeper operator to a desired version while preserving the cluster's state and data whenever possible. ## Overview @@ -21,23 +21,24 @@ bookkeeper 0.4.0 7 7 11m ## Valid Upgrade Paths -To understand the valid upgrade paths for a pravega cluster, refer to the [version map](https://github.com/pravega/bookkeeper-operator/blob/master/deploy/version_map.yaml). The key indicates the base version of the cluster, and the value against each key indicates the list of valid versions this base version can be upgraded to. +To understand the valid upgrade paths for a bookkeeper cluster, refer to the [version map](https://github.com/pravega/bookkeeper-operator/blob/master/deploy/version_map.yaml). The key indicates the base version of the cluster, and the value against each key indicates the list of valid versions this base version can be upgraded to. ## Trigger an upgrade ### Upgrading via Helm -The upgrade can be triggered via helm using the following command +The upgrade of the bookkeeper cluster from a version **[OLD_VERSION]** to **[NEW_VERSION]** can be triggered via helm using the following command ``` -$ helm upgrade --timeout 600s +$ helm upgrade [BOOKKEEPER_RELEASE_NAME] pravega/bookkeeper --version=[NEW_VERSION] --set version=[NEW_VERSION] --reuse-values --timeout 600s ``` +**Note:** By specifying the `--reuse-values` option, the configuration of all parameters are retained across upgrades. However if some values need to be modified during the upgrade, the `--set` flag can be used to specify the new configuration for these parameters. Also, by skipping the `reuse-values` flag, the values of all parameters are reset to the default configuration that has been specified in the published charts for version [NEW_VERSION]. ### Upgrading manually To initiate the upgrade process manually, a user has to update the `spec.version` field on the `BookkeeperCluster` custom resource. This can be done in three different ways using the `kubectl` command. -1. `kubectl edit BookkeeperCluster `, modify the `version` value in the YAML resource, save, and exit. +1. `kubectl edit BookkeeperCluster [CLUSTER_NAME]`, modify the `version` value in the YAML resource, save, and exit. 2. If you have the custom resource defined in a local YAML file, e.g. `bookkeeper.yaml`, you can modify the `version` value, and reapply the resource with `kubectl apply -f bookkeeper.yaml`. -3. `kubectl patch BookkeeperCluster --type='json' -p='[{"op": "replace", "path": "/spec/version", "value": "X.Y.Z"}]'`. +3. `kubectl patch BookkeeperCluster [CLUSTER_NAME] --type='json' -p='[{"op": "replace", "path": "/spec/version", "value": "X.Y.Z"}]'`. After the `version` field is updated, the operator will detect the version change and it will trigger the upgrade process. ## Upgrade process @@ -51,9 +52,9 @@ The upgrade workflow is as follows: - When all pods are upgraded, the `Upgrade` condition will be set to `False` and `status.currentVersion` will be updated to the desired version. -### BookKeeper upgrade +### Bookkeeper upgrade -BookKeeper cluster is deployed as a [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) due to its requirements on: +Bookkeeper cluster is deployed as a [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) due to its requirements on: - Persistent storage: each bookie has three persistent volume for ledgers, journals, and indices. If a pod is migrated or recreated (e.g. when it's upgraded), the data in those volumes will remain untouched. - Stable network names: the `StatefulSet` provides pods with a predictable name and a [Headless service](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services) creates DNS records for pods to be reachable by clients. If a pod is recreated or migrated to a different node, clients will continue to be able to reach the pod despite changing its IP address. @@ -65,9 +66,9 @@ Statefulset [upgrade strategy](https://kubernetes.io/docs/concepts/workloads/con In both cases, the upgrade is initiated when the Pod template is updated. -For BookKeeper, the operator uses an `OnDelete` strategy. With `RollingUpdate` strategy, you can only check the upgrade status once all pods get upgraded. On the other hand, with `OnDelete` you can keep updating pod one by one and keep checking the application status to make sure the upgrade working fine. This allows the operator to have control over the upgrade process and perform verifications and actions before and after a BookKeeper pod is upgraded. For example, checking that there are no under-replicated ledgers before upgrading the next pod. Also, the operator might be need to apply migrations when upgrading to a certain version. +For Bookkeeper, the operator uses an `OnDelete` strategy. With `RollingUpdate` strategy, you can only check the upgrade status once all pods get upgraded. On the other hand, with `OnDelete` you can keep updating pod one by one and keep checking the application status to make sure the upgrade working fine. This allows the operator to have control over the upgrade process and perform verifications and actions before and after a Bookkeeper pod is upgraded. For example, checking that there are no under-replicated ledgers before upgrading the next pod. Also, the operator might be need to apply migrations when upgrading to a certain version. -BookKeeper upgrade process is as follows: +Bookkeeper upgrade process is as follows: 1. Statefulset Pod template is updated to the new image and tag according to the Pravega version. 2. Pick one outdated pod @@ -75,7 +76,7 @@ BookKeeper upgrade process is as follows: 4. Delete the pod. The pod is recreated with an updated spec and version 5. Wait for the pod to become ready. If it fails to start or times out, the upgrade is cancelled. Check [Recovering from a failed upgrade](#recovering-from-a-failed-upgrade) 6. Apply post-upgrade actions and verifications -7. If all pods are updated, BookKeeper upgrade is completed. Otherwise, go to 2. +7. If all pods are updated, Bookkeeper upgrade is completed. Otherwise, go to 2. ### Monitor the upgrade process diff --git a/doc/webhook.md b/doc/webhook.md index 52288adb..e6025347 100644 --- a/doc/webhook.md +++ b/doc/webhook.md @@ -1,23 +1,25 @@ ## Admission Webhook [Admission webhooks](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) are HTTP callbacks that receive admission requests and do something with them. -There are two webhooks [ValidatingAdmissionWebhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook) and -[MutatingAdmissionWebhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) which are basically -doing the same thing except MutatingAdmissionWebhook can modify the requests. In our case, we use MutatingAdmissionWebhook because it can validate requests as well as mutating them. E.g. clear the image tag -if version is specified. +There are two webhooks [ValidatingAdmissionWebhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook) and +[MutatingAdmissionWebhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook) which are basically doing the same thing except MutatingAdmissionWebhook can modify the requests. In our case, we are using a ValidatingAdmissionWebhook so that it can reject requests to enforce custom policies (which in our case is to ensure that the user is unable to install an invalid bookkeeper version or upgrade to any unsupported bookkeeper version). -In the Bookkeeper operator repo, we are leveraging the webhook implementation from controller-runtime package, here is the [GoDoc](https://godoc.org/sigs.k8s.io/controller-runtime/pkg/webhook). -In detail, there are two steps that developers need to do 1) create webhook server and 2) implement the handler. +In the bookkeeper operator repo, we are leveraging the webhook implementation from controller-runtime package, here is the [GoDoc](https://godoc.org/sigs.k8s.io/controller-runtime/pkg/webhook). + +If you want to implement admission webhooks for your CRD, the only thing you need to do is to implement the `Defaulter` and (or) the `Validator` interface. Kubebuilder takes care of the rest for you, such as: +- Creating the webhook server. +- Ensuring the server has been added in the manager. +- Creating handlers for your webhooks. +- Registering each handler with a path in your server. The webhook server registers webhook configuration with the apiserver and creates an HTTP server to route requests to the handlers. -The server is behind a Kubernetes Service and provides a certificate to the apiserver when serving requests. The kubebuilder has a detailed instruction of -building a webhook, see [here](https://github.com/kubernetes-sigs/kubebuilder/blob/86026527c754a144defa6474af6fb352143b9270/docs/book/beyond_basics/sample_webhook.md). +The server is behind a Kubernetes Service and provides a certificate to the apiserver when serving requests. +The kubebuilder has a detailed instruction of building a webhook, see [here](https://book.kubebuilder.io/cronjob-tutorial/webhook-implementation.html) -The webhook feature itself is enabled by default but it can be disabled if `webhook=false` is specified when installing the -operator locally using `operator-sdk up local`. E.g. ` operator-sdk up local --operator-flags -webhook=false`. The use case of this is that webhook needs to be -disabled when developing the operator locally since webhook can only be deployed in Kubernetes environment. +The webhook feature itself is enabled by default but it can be disabled if `webhook=false` is specified when installing the +operator locally using `operator-sdk run --local`. E.g. `operator-sdk run --local --operator-flags -webhook=false`. The use case of this is that webhook needs to be disabled when developing the operator locally since webhook can only be deployed in Kubernetes environment. ### How to deploy -The webhook is deployed along with the Bookkeeper operator, thus there is no extra steps needed. However, there are some configurations that are necessary to make webhook work. +The ValidatingAdmissionWebhook and the webhook service should be deployed using the provided manifest `webhook.yaml` while deploying the Bookkeeper Operator. However, there are some configurations that are necessary to make webhook work. 1. Permission @@ -27,7 +29,7 @@ an example of the additional permission - apiGroups: - admissionregistration.k8s.io resources: - - mutatingwebhookconfigurations + - validatingwebhookconfigurations verbs: - '*' ``` @@ -36,9 +38,7 @@ an example of the additional permission The webhook will deploy a Kubernetes service. This service will need to select the operator pod as its backend. The way to select is using Kubernetes label selector and user will need to specify `"component": "bookkeeper-operator"` as the label -when deploying the Bookkeeper operator deployment. -``` +when deploying the Bookkeeper operator deployment. ### What it does -The webhook maintains a compatibility matrix of the Bookkeeper versions. Reuqests will be rejected if the version is not valid or not upgrade compatible -with the current running version. Also, all the upgrade requests will be rejected if the current cluster is in upgrade status. +The webhook maintains a compatibility matrix of the Bookkeeper versions. Requests will be rejected if the version is not valid or not upgrade compatible with the current running version. Also, all the upgrade requests will be rejected if the current cluster is in upgrade status. diff --git a/tools/manifest_files/webhook.yaml b/tools/manifest_files/webhook.yaml index f4dbc34d..f0361c8a 100644 --- a/tools/manifest_files/webhook.yaml +++ b/tools/manifest_files/webhook.yaml @@ -2,7 +2,7 @@ apiVersion: v1 kind: Service metadata: name: bookkeeper-webhook-svc - namespace: default + namespace: default spec: ports: - port: 443 @@ -24,9 +24,10 @@ webhooks: - clientConfig: service: name: bookkeeper-webhook-svc - namespace: default + namespace: default path: /validate-bookkeeper-pravega-io-v1alpha1-bookkeepercluster name: bookkeeperwebhook.pravega.io + failurePolicy: Fail rules: - apiGroups: - bookkeeper.pravega.io diff --git a/tools/operatorUpgrade.sh b/tools/operatorUpgrade.sh index ac4d91f0..b4d0591a 100755 --- a/tools/operatorUpgrade.sh +++ b/tools/operatorUpgrade.sh @@ -5,7 +5,7 @@ echo "Running pre-upgrade script for upgrading bookeeper operator from version p if [ "$#" -ne 3 ]; then echo "Error : Invalid number of arguments" - Usage: "./operatorUpgrade.sh " + Usage: "./operatorUpgrade.sh " exit 1 fi