Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: Authentication and TLS Support #1435

Closed
karlschriek opened this issue Dec 2, 2019 · 32 comments
Closed

AWS: Authentication and TLS Support #1435

karlschriek opened this issue Dec 2, 2019 · 32 comments

Comments

@karlschriek
Copy link

https://www.kubeflow.org/docs/aws/authentication/

It would be very useful if there was a more comprehensive guide / some troubleshooting assistance for this. Setting up authenticated access is pretty far from trivial.

I have followed the steps above, but am unable to reach the authetication screen. This is very hard to troubleshoot, since the problem could be in any one of

  1. Cognito user pool
  2. The Route 53 CNAME definition
  3. The Istio YAML definition
  4. AWS Certificate management

There are also a few inconsistencies on the page that are either mistakes or otherwise need to be fully explained:

  • The example shows registering a custom domain (www.shanjiaxin.com in this case), but then in the Amazon Cognito app client settings, this is entered as www.shanjiaxin.com/oauth2/idresponse. Why isn't it just www.shanjiaxin.com? This should be explained!

  • Under Cognito Domain Name the value kubeflow-testing is entered. In the example YAML snippet below you have cognitoUserPoolDomain: your-user-pool. Is far as I can tell, this should be cognitoUserPoolDomain: kubeflow-testing to be consistent with the rest of the example.

  • For adding a CNAME under Points-To it appears to point to the ALB endpoint. It should be explained how this endpoint can be found and if it should be typed in exactly as it is. Since most users will probably use Route 53 for this, a guide that shows the steps for Route 53 would be sensible

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label kind/feature to this issue, with a confidence of 0.89. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@sarahmaddox sarahmaddox changed the title Authentication and TLS Support AWS: Authentication and TLS Support Dec 2, 2019
@sarahmaddox
Copy link
Contributor

@Jeffwan FYI

@Jeffwan
Copy link
Member

Jeffwan commented Dec 3, 2019

I will improve it by EOD. Sorry for the delay and we are in an aws conference and response will be slow..

@karlschriek
Copy link
Author

I will improve it by EOD. Sorry for the delay and we are in an aws conference and response will be slow..

@Jeffwan , as always, thanks for the prompt response! Doesn't have to be today for my sake, I will likely only get back to this again next week at the earliest. Possibly even only in January

@sarahmaddox sarahmaddox added the doc-sprint Issues to work on during the Kubeflow Doc Sprint label Jan 1, 2020
@jtfogarty
Copy link

/priority p2

@dilzeem
Copy link

dilzeem commented Feb 3, 2020

Hi @karlschriek did you end up solving this. I am getting similar issues.

My ALB doesn't seem to have HTTPS listener at all when I deploy kubeflow, when I am using kfctl_aws_cognito.0.7.1.yaml.

To get it to work authentication to work, you have to create these manually and set it up.

Also there seem to be not target created, that the listener can attach to. When I try to add the listener manually, and forward to a target group the option is greyed out.

@theofpa
Copy link
Member

theofpa commented Feb 11, 2020

There are some more details on how to setup cognito with route53, CM and istio in an end to end guide for aws I’m working on.

@sarahmaddox
Copy link
Contributor

Related issue: #1541

@Jeffwan
Copy link
Member

Jeffwan commented Feb 11, 2020

@theofpa I appreciate the help and please include me in the PR and I can help on review

@Jeffwan
Copy link
Member

Jeffwan commented Feb 12, 2020

@karlschriek @dilzeem Please check if the end to end guide address your concern. If not, we can file separate PR to address it.

@karlschriek
Copy link
Author

Cool, this looks fairly comprehensive. Once a stable version of 1.0 is released I'll go through the steps as described. (I did manage to get it all up and running on 0.7.1, so will leave that as it is for now).

@Jeffwan
Copy link
Member

Jeffwan commented Feb 13, 2020

@karlschriek @dilzeem BTW, authorization has been added kubeflow/manifests#908 here. Do you frequently use Cognito with your IDP or use OIDC directly?

@dilzeem
Copy link

dilzeem commented Feb 13, 2020 via email

@Jeffwan
Copy link
Member

Jeffwan commented Feb 13, 2020

@dilzeem It would be great if you can file a new file issue and I will help resolve this problem. If we set configs correctly, HTTP 443 listener should be added. Target group is still HTTP now. secure way has been addressed here. https://github.com/kubeflow/manifests/pull/653/files

@dilzeem
Copy link

dilzeem commented Feb 13, 2020 via email

@karlschriek
Copy link
Author

For the moment we simply create and manage user pools via Cognito. I'm not sure I entirely follow what kubeflow/manifests#908 is doing. Would that allow managing authorisation directly within Kubeflow (i.e. no need to use Cognito)?

@Jeffwan
Copy link
Member

Jeffwan commented Feb 23, 2020

For the moment we simply create and manage user pools via Cognito. I'm not sure I entirely follow what kubeflow/manifests#908 is doing. Would that allow managing authorisation directly within Kubeflow (i.e. no need to use Cognito)?

@karlschriek
You can either use Cognito or OIDC on ALB. It totally depends on where is your user pool and how do you like to connect IDP to kubeflow. For example, if you use Github, Google IDP, then they makes no difference, most enterprise identity providers are supported by both of them. If you like to use AWS organizations or SSO issues, I think Cognito may brings more features.

That feature brings authorization support and append a new header kubeflow-userid: ${your_user_email} in http request. This will be used by a few components like central dashboard to provide istolation

@Jeffwan
Copy link
Member

Jeffwan commented Feb 23, 2020

Revisit this issue

The example shows registering a custom domain (www.shanjiaxin.com in this case), but then in the Amazon Cognito app client settings, this is entered as www.shanjiaxin.com/oauth2/idresponse. Why isn't it just www.shanjiaxin.com? This should be explained!

I think Cognito website has more details on this. Since we use ALB ingress controller to provision ALB, I also find some doc there.
https://kubernetes-sigs.github.io/aws-alb-ingress-controller/guide/cognito/setup/#cognitio-configuration

In this example, we add user directly in Cognito. In real world environment, most of the time we connect with some other IDP which makes tutorial completed. Please check guidance here to setup coginito.
https://docs.aws.amazon.com/cognito/latest/developerguide/cognito-user-pools-social-idp.html

Under Cognito Domain Name the value kubeflow-testing is entered. In the example YAML snippet below you have cognitoUserPoolDomain: your-user-pool. Is far as I can tell, this should be cognitoUserPoolDomain: kubeflow-testing to be consistent with the rest of the example.

image
Agree that your-user-pool is misleading, should be either Amazon Cognito domain or bring your own domain like what this post does.

For adding a CNAME under Points-To it appears to point to the ALB endpoint. It should be explained how this endpoint can be found and if it should be typed in exactly as it is. Since most users will probably use Route 53 for this, a guide that shows the steps for Route 53 would be sensible

I think we can either change to use Route53 to manage domain or add more details here

@karlschriek
Copy link
Author

karlschriek commented Feb 24, 2020

Thanks. The explanations in the new "end-to-end" guide are much easier to follow!

That feature brings authorization support and append a new header kubeflow-userid: ${your_user_email} in http request. This will be used by a few components like central dashboard to provide istolation

Ok, that is something we have definitely been waiting for! How should I imagine this working? If I log in with a user "[email protected]", then in the central dashboard will have an isolated namespace for this specific user as opposed to using "anonymous" or some other arbitrary namespace?

@Jeffwan
Copy link
Member

Jeffwan commented Feb 24, 2020

  1. Cluster admin should create a profile like this. Then profile controller will create a namespace karl and RBAC side will try to recognize kubeflow-userid header with value [email protected]
apiVersion: kubeflow.org/v1
kind: Profile
metadata:
  name: karl
spec:
  owner:
    kind: User
    name: [email protected]
  1. karl login central dashboard, since auth adaptor can retrieve email from user claim, it will append the header. central dashboard will notice this user has auth and workgroup, then use user's namespace rather than anonymous .

@Can-Sahin
Copy link

Not working for me either!

I followed the docs and I couldn't get the ALB authentication working with Cognito. It constantly says 401 when cognito calls the callback with https://xxxxx.eu-west-1.elb.amazonaws.com/oauth2/idpresponse?code=xxxxxx which should work according to the docs.

I am also using an unrelated pre-existing certArn and docs says it will warn(in the browser) but still work. Could the invalid certificate be the problem?. I kinda debugged the rest of the factors.

ALB has Cognito configured as a rule correctly (i checked the console).
I created another rule with HTTP 80 just to test if pods are working actually and calling loadbalancer directly with HTTP opens WEB UI as intended.

Currently stuck :( Any idea?
ps: using kfctl_aws_cognito.v1.0.2.yaml

@Jeffwan
Copy link
Member

Jeffwan commented Apr 22, 2020

@Can-Sahin unrelated certArn should be fine to get ALB work, but the certificate is invalid. browser will definitely complain about it.

It would be better to show your configurations and we can help debug.

@Can-Sahin
Copy link

Can-Sahin commented Apr 23, 2020

This is my config yaml

apiVersion: kfdef.apps.kubeflow.org/v1
kind: KfDef
metadata:
  annotations:
    kfctl.kubeflow.io/force-delete: "false"
  clusterName: kubeflow-test.eu-west-1.eksctl.io
  creationTimestamp: null
  name: kubeflow-test
  namespace: kubeflow
spec:
  applications:
  - kustomizeConfig:
      parameters:
      - name: namespace
        value: istio-system
      repoRef:
        name: manifests
        path: istio/istio-crds
    name: istio-crds
  - kustomizeConfig:
      parameters:
      - name: namespace
        value: istio-system
      repoRef:
        name: manifests
        path: istio/istio-install
    name: istio-install
  - kustomizeConfig:
      parameters:
      - name: namespace
        value: istio-system
      repoRef:
        name: manifests
        path: istio/cluster-local-gateway
    name: cluster-local-gateway
  - kustomizeConfig:
      parameters:
      - name: clusterRbacConfig
        value: "OFF"
      repoRef:
        name: manifests
        path: istio/istio
    name: istio
  - kustomizeConfig:
      repoRef:
        name: manifests
        path: application/application-crds
    name: application-crds
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: application/application
    name: application
  - kustomizeConfig:
      parameters:
      - name: namespace
        value: cert-manager
      repoRef:
        name: manifests
        path: cert-manager/cert-manager-crds
    name: cert-manager-crds
  - kustomizeConfig:
      parameters:
      - name: namespace
        value: kube-system
      repoRef:
        name: manifests
        path: cert-manager/cert-manager-kube-system-resources
    name: cert-manager-kube-system-resources
  - kustomizeConfig:
      overlays:
      - self-signed
      - application
      parameters:
      - name: namespace
        value: cert-manager
      repoRef:
        name: manifests
        path: cert-manager/cert-manager
    name: cert-manager
  - kustomizeConfig:
      repoRef:
        name: manifests
        path: metacontroller
    name: metacontroller
  - kustomizeConfig:
      overlays:
      - istio
      - application
      repoRef:
        name: manifests
        path: argo
    name: argo
  - kustomizeConfig:
      repoRef:
        name: manifests
        path: kubeflow-roles
    name: kubeflow-roles
  - kustomizeConfig:
      overlays:
      - istio
      - application
      parameters:
      - name: userid-header
        value: kubeflow-userid
      repoRef:
        name: manifests
        path: common/centraldashboard
    name: centraldashboard
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: admission-webhook/webhook
    name: webhook
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: webhookNamePrefix
        value: admission-webhook-
      repoRef:
        name: manifests
        path: admission-webhook/bootstrap
    name: bootstrap
  - kustomizeConfig:
      overlays:
      - istio
      - application
      parameters:
      - name: userid-header
        value: kubeflow-userid
      repoRef:
        name: manifests
        path: jupyter/jupyter-web-app
    name: jupyter-web-app
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: spark/spark-operator
    name: spark-operator
  - kustomizeConfig:
      overlays:
      - istio
      - application
      - db
      repoRef:
        name: manifests
        path: metadata
    name: metadata
  - kustomizeConfig:
      overlays:
      - istio
      - application
      repoRef:
        name: manifests
        path: jupyter/notebook-controller
    name: notebook-controller
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pytorch-job/pytorch-job-crds
    name: pytorch-job-crds
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pytorch-job/pytorch-operator
    name: pytorch-operator
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: namespace
        value: knative-serving
      repoRef:
        name: manifests
        path: knative/knative-serving-crds
    name: knative-crds
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: namespace
        value: knative-serving
      repoRef:
        name: manifests
        path: knative/knative-serving-install
    name: knative-install
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: kfserving/kfserving-crds
    name: kfserving-crds
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: kfserving/kfserving-install
    name: kfserving-install
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: usageId
        value: "5459673799330546546"
      - name: reportUsage
        value: "true"
      repoRef:
        name: manifests
        path: common/spartakus
    name: spartakus
  - kustomizeConfig:
      overlays:
      - istio
      repoRef:
        name: manifests
        path: tensorboard
    name: tensorboard
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: tf-training/tf-job-crds
    name: tf-job-crds
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: tf-training/tf-job-operator
    name: tf-job-operator
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: katib/katib-crds
    name: katib-crds
  - kustomizeConfig:
      overlays:
      - application
      - istio
      repoRef:
        name: manifests
        path: katib/katib-controller
    name: katib-controller
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/api-service
    name: api-service
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: minioPvName
        value: minio-pv
      - name: minioPvcName
        value: minio-pv-claim
      repoRef:
        name: manifests
        path: pipeline/minio
    name: minio
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: mysqlPvName
        value: mysql-pv
      - name: mysqlPvcName
        value: mysql-pv-claim
      repoRef:
        name: manifests
        path: pipeline/mysql
    name: mysql
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/persistent-agent
    name: persistent-agent
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/pipelines-runner
    name: pipelines-runner
  - kustomizeConfig:
      overlays:
      - istio
      - application
      repoRef:
        name: manifests
        path: pipeline/pipelines-ui
    name: pipelines-ui
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/pipelines-viewer
    name: pipelines-viewer
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/scheduledworkflow
    name: scheduledworkflow
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: pipeline/pipeline-visualization-service
    name: pipeline-visualization-service
  - kustomizeConfig:
      overlays:
      - application
      - istio
      parameters:
      - name: userid-header
        value: kubeflow-userid
      repoRef:
        name: manifests
        path: profiles
    name: profiles
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: seldon/seldon-core-operator
    name: seldon-core
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: mpi-job/mpi-operator
    name: mpi-operator
  - kustomizeConfig:
      overlays:
      - cognito
      parameters:
      - name: namespace
        value: istio-system
      - name: CognitoUserPoolArn
        value: arn:aws:cognito-idp:eu-west-1:xxxx:userpool/eu-west-1_xxxx
      - name: CognitoUserPoolDomain
        value: kubeflow1
      - name: CognitoAppClientId
        value: xxxxxx
      - name: certArn
        value: arn:aws:acm:eu-west-1:xxx:certificate/69f43fcf-4303-4747-aa5b-5xxxxxx
      repoRef:
        name: manifests
        path: aws/istio-ingress
    name: istio-ingress
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: namespace
        value: istio-system
      - name: origin-header
        value: x-amzn-oidc-data
      - name: custom-header
        value: kubeflow-userid
      repoRef:
        name: manifests
        path: aws/aws-istio-authz-adaptor
    name: aws-istio-authz-adaptor
  - kustomizeConfig:
      overlays:
      - application
      parameters:
      - name: clusterName
        value: kubeflow-test
      repoRef:
        name: manifests
        path: aws/aws-alb-ingress-controller
    name: aws-alb-ingress-controller
  - kustomizeConfig:
      overlays:
      - application
      repoRef:
        name: manifests
        path: aws/nvidia-device-plugin
    name: nvidia-device-plugin
  plugins:
  - kind: KfAwsPlugin
    metadata:
      creationTimestamp: null
      name: aws
    spec:
      auth:
        cognito:
          certArn: arn:aws:acm:eu-west-1:xxxx:certificate/69f43fcf-4303-4747-aa5b-xxxxx
          cognitoAppClientId: xxxx
          cognitoUserPoolArn: arn:aws:cognito-idp:eu-west-1:xxx:userpool/eu-west-xxx
          cognitoUserPoolDomain: kubeflow1
      region: eu-west-1
      roles:
      - eksctl-kubeflow-test-nodegroup-ng-NodeInstanceRole-1UPCQVKH9X8UH
  repos:
  - name: manifests
    uri: https://github.com/kubeflow/manifests/archive/v1.0.2.tar.gz
  version: v1.0.2
status:
  reposCache:
  - localPath: '"/Users/cansahin/Desktop/kubeflow/deployments/kubeflow-test/.cache/manifests/manifests-1.0.2"'
    name: manifests
  - localPath: '"/Users/cansahin/Desktop/kubeflow/deployments/kubeflow-test/.cache/manifests/manifests-1.0.2"'
    name: manifests
  - localPath: '".cache/manifests/manifests-1.0.2"'
    name: manifests
  - localPath: '".cache/manifests/manifests-1.0.2"'
    name: manifests

And after that I create profile.yaml too

apiVersion: kubeflow.org/v1beta1
kind: Profile
metadata:
  name: user1
spec:
  owner:
    kind: User
    name: user1

The problem I see is even though ALB has a rule to authenticate with Cognito when I browse the ALB address it just keeps loading forever. Shouldn't it forward to cognito ?

I use callback url to my signin page as described: https://DNS_NAME/oauth2/idpresponse

and after the signin it calls with https://DNS_NAME/oauth2/idpresponse?code=SOME_CODE which always says 401 as the response.

ALB pod logs show nothing basically. It only shows the logs that are written during the installation.

I'm really stuck. Thanks alot

Edit: typo

@Can-Sahin
Copy link

There are some other bugs I saw while debugging this.

  1. kfctl delete does not delete ALB and target groups. So deleting cluster also fails since these are using the VPC

  2. Changing certArn and kfctl build then kfctl deploy doesnot change the certArn in ALB. It still logs the old certArn. I had to uninstall kubeflow and re-install again.

@Jeffwan
Copy link
Member

Jeffwan commented Apr 24, 2020

@Can-Sahin Please check kubeflow website. The configuration details are there.

  1. kfctl delete does not delete ALB and target groups. So deleting cluster also fails since these are using the VPC

kfctl does have the logic to delete istio-ingress first, ALB ingress controller will delete ALB. What's the deletion logs?

  1. Changing certArn and kfctl build then kfctl deploy doesnot change the certArn in ALB. It still logs the old certArn. I had to uninstall kubeflow and re-install again.

can you try kfctl apply ?

@Can-Sahin
Copy link

I meant kfctl apply(not kfctl deploy). It doesn't update the certArn. I also manually remove the pod with kubectl delete xxx and check the logs afterward and it still spawns with old the certArn.

I saw the ALB deletion logic in kubeflow but after the logs says Waiting xx seconds for ALB to delete(something similar) it continues to remove the other parts but ALB still survives. There are no pod logs written either.

However, these can be separate issue. They are minor in comparison to what I am suffering now. I am blocked with 401 error. After 2 days (non-stop) trial and error and debugging I gave up setting kubeflow with cognito. I will try the older kubeflow versions sometime.

Is there any other way of debugging this problem other than kubectl logs ... command ?. So that I can debug whats wrong with it :S I'm not experience with kubernetes.

@theofpa
Copy link
Member

theofpa commented Apr 25, 2020

I use callback url to my signin page as described: https://DNS_NAME/oauth2/idresponse

and after the signin it calls with https://DNS_NAME/oauth2/idresponse?code=SOME_CODE which always says 401 as the response.

@Can-Sahin the callback url you are using has a typo, instead of /oauth2/idresponse it should be /oauth2/idpresponse.

(edit oauth2)

@Can-Sahin
Copy link

Can-Sahin commented Apr 25, 2020

Woow can't be serious 😟

Both this and this says oauth2 in the url. If its really the url mistake then docs needs to be updated.

I will try monday since I deleted my cluster and don't wanna spend my weekend on it.

PS: AWS Docs also says oauth2

@theofpa
Copy link
Member

theofpa commented Apr 25, 2020

I wanted to focus on the idpresponse typo

@Can-Sahin
Copy link

I made a typo here sorry. Just realized. It is idpresponse in my cognito anyway. I was checking those 10 times while debugging. Sorry for that.

@Jeffwan
Copy link
Member

Jeffwan commented Apr 26, 2020

I think @theofpa contributed e2e docs and I also make some improvements there. It has all screenshot and require info to launch a secure cluster. I will close this issue. Feel free to reopen if it's still a problem

@Jeffwan Jeffwan closed this as completed Apr 26, 2020
@summerisc
Copy link

I am facing the same thing @Can-Sahin saw for 401 error. Any updates on this issue? Appreciate the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants