Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pravega-operator pod crashing if WATCH_NAMESPACE not set #205

Closed
jkhalack opened this issue Jun 3, 2019 · 11 comments · Fixed by #236
Closed

Pravega-operator pod crashing if WATCH_NAMESPACE not set #205

jkhalack opened this issue Jun 3, 2019 · 11 comments · Fixed by #236
Assignees
Labels
version 0.4.0 Issue with Operator 0.4.0

Comments

@jkhalack
Copy link

jkhalack commented Jun 3, 2019

Problem Description

We are deploying pravega-operator with the intent for it to watch all namespaces (so WATCH_NAMESPACE=''). If used to work just fine in version 0.3.2.
When we try to switch to pravega-operator:0.4.0, the pravega-operator keeps crashing.

Pravega-operator Log

$ kubectl logs pravega-operator-5ff4459676-bz2qz -n pravega
time="2019-06-03T14:27:38Z" level=info msg="pravega-operator Version: 0.4.0"
time="2019-06-03T14:27:38Z" level=info msg="Git SHA: 2ee088a"
time="2019-06-03T14:27:38Z" level=info msg="Go Version: go1.10.1"
time="2019-06-03T14:27:38Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-06-03T14:27:38Z" level=info msg="operator-sdk Version: v0.4.0"
time="2019-06-03T14:27:38Z" level=info msg="Registering Components"
2019/06/03 14:27:38 Initializing webhook
time="2019-06-03T14:27:38Z" level=info msg="Starting the Cmd"
time="2019-06-03T14:27:38Z" level=info msg="Reconciling PravegaCluster pravega/pravega\n"
time="2019-06-03T14:27:38Z" level=fatal msg="MutatingWebhookConfiguration.admissionregistration.k8s.io \"mutating-webhook-configuration\" is invalid: webhooks[0].clientConfig.service.namespace: Required value: service namespace is requiredmanager exited non-zero"

Problem Location
Webhook code

@Tristan1900
Copy link
Member

Closing this since we agree to use WATCH_NAMESPACE

@jkhalack
Copy link
Author

Does WATCH_NAMESPACE='' work now?

@jkhalack
Copy link
Author

jkhalack commented Jul 17, 2019

There are no changes to the code, and the issue is a regression from 0.3.x.
Reopening

@jkhalack
Copy link
Author

In 0.3.2, pravega-operator was supposedly able to create pravega-cluster in more than one namespace.
@pbelgundi Are you saying this was not the case? Or are you saying that pravega-operator needs to regress going forward?

@pbelgundi
Copy link
Contributor

pbelgundi commented Jul 22, 2019

@jkhalack, pravega-operator always creates the cluster in one namespace. There is a 1-to-1 mapping between an operator instance and pravega-cluster instance and the 'operator' and 'pravega-cluster' always need to exist in the same namespace for the operator to function correctly.

@jkhalack
Copy link
Author

@pbelgundi
If one and the same namespace is the only behaviour, then the operator should not be crashing if WATCH_NAMESPACE is not set (see the title and the description of the issue).
The operator should be able to proceed without the user doing it manually.
The operator should overwrite the WATCH_NAMESPACE variable to use its own namespace if it is the only supported usecase).

@EronWright
Copy link
Contributor

EronWright commented Jul 26, 2019

Somehow two aspects are being mixed up; here is the typical/correct solution.

  1. WATCH_NAMESPACE is the environment variable that controls which namespaces are watched by the controller. The default value is blank to watch all namespaces.
  2. WEBHOOK_NAMESPACE is the environment variable to control the namespace of the Service resource that exposes the webhook endpoint. It has nothing to do with the resources that are watched.

@pbelgundi is making an undue assertion about how many Pravega clusters could be managed by an instance of the operator. It would be unusual to limit it as was suggested. That said, Nautilus deploys one instance.

The wrong conclusion was reached in #206. @Tristan1900 had the correct solution IMO. Note that the WEBHOOK_NAMESPACE variable is typically set using the Kubernetes Downward API:

      env:
        - name: WEBHOOK_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace

If you survey the various operators that are based on Operator SDK, you will see the pattern that I described.

@pbelgundi
Copy link
Contributor

@jkhalack, I agree that operator should not be crashing when WATCH_NAMESPACE is not set and we have merged a PR for the same (see merged PR above)

@pbelgundi
Copy link
Contributor

@EronWright I agree that WATCH_NAMESPACE and NAMESPACE in which the operator itself runs can be 2 separate namespaces and may have nothing to do with each other.
As rightly suggested by you WATCH_NAMESPACE is the namespace the operator "watches".

However, for these reasons I do not see the need for a separate "WEBHOOK_NAMESPACE" variable, that would decide the namespace for webhook service:

  1. The webhook service is created by operator and used by operator. It is not an existing external service running in a different namespace, that the operator plans to use for its validation.
  2. This service is not a generic service that can be used by any other component and its sole consumer is the operator.
  3. I don't see any reason to have this hosted in a different namespace than the operator itself.

As such, above PR makes sure webhook service is always deployed in the same namespace where the operator itself runs and there is no need for a separate WEBHOOK_NAMESPACE to be set.

@pbelgundi
Copy link
Contributor

Closing this issue since setting WATCH_NAMESPACE="" does not cause operator to fail after this fix.

@EronWright
Copy link
Contributor

EronWright commented Aug 22, 2019

@pbelgundi to better explain my rationale for suggesting WeBHOOK_NAMESPACE, a few points:

  1. WEBHOOK_NAMESPACE is a convention seen in operators based on the controller-runtime library.
  2. It is possible to run an operator from outside the cluster, e.g. in a dev mode. I routinely do this to be able to set breakpoints. This works even when the cluster is remote. In that scenario, the operator doesn't have a namespace, and the call to k8sutil.GetOperatorNamespace() would fail.
  3. The README.md of project operator discusses how to develop an operator on your desktop.

The latest version of controller-runtime has a breaking change, it doesn't program the webhook service anymore. So, this discussion is moot, don't feel a need to change anything now.

ref: https://github.com/operator-framework/operator-sdk/blob/v0.10.x/pkg/k8sutil/k8sutil.go#L62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version 0.4.0 Issue with Operator 0.4.0
Projects
None yet
5 participants