Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run metrics helper with IAM roles mapped to k8s service accounts #663

Closed
miguelaferreira opened this issue Oct 18, 2019 · 22 comments
Closed

Comments

@miguelaferreira
Copy link

I've setup my EKS cluster to block pod access to the EC2 metadata endpoint and instead obtain IAM policies via roles mapped to service accounts (via OpenID Connect).

Turns out that the cni metrics helper wants to reach that endpoint. Since I'm blocking pod access to the EC2 metadata with (calico) network policies, I've allowed that one pod (metrics helper) to reach the endpoint. What happens next is that, since the pod can reach the EC2 metadata endpoint, it assumes the worker role instead of the role I created for it.

I'm stuck in between the setup with IAM roles mapped to k8s service accounts and running the metrics helper. Is there a way to have both?

@miguelaferreira miguelaferreira changed the title Cannot run metrics helper with service accounts and OpenID Connect Cannot run metrics helper with IAM roles mapped to k8s service accounts Oct 18, 2019
@davidshin
Copy link

@miguelaferreira Were you actually able to get the IAM roles via OIDC working? The documentation here (https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-minimum-sdk.html) says that the minimum Go SDK version is 1.23.13, but I believe amazon-vpc-cni-k8s is using 1.21.7 (https://github.com/aws/amazon-vpc-cni-k8s/blob/master/go.mod)

@miguelaferreira
Copy link
Author

miguelaferreira commented Oct 25, 2019

Yes, I got it working. Instead of the manifest on that same documentation (step 3 in For all other Kubernetes versions) I used this other one that pulls in version v1.5.4.

I've made a terraform module that is able to upgrade the CNI plugin that is installed by default in EKS, and another one that sets up the IAM side of things.

@mogren mogren added bug and removed question labels Oct 25, 2019
@mogren
Copy link
Contributor

mogren commented Oct 29, 2019

@miguelaferreira There is an issue with ip rules going missing in v1.5.4 (#641), please try the v1.5.5 release candidate instead.

@miguelaferreira
Copy link
Author

I've tried that version @mogren but I still get the same output.

With network policy blocking access to the EC2 metadata endpoint pod complaints it needs that access:

....
│ E1029 09:59:22.901290       1 cni-metrics-helper.go:99] Failed to create publisher: publisher: unable to obtain EC2 service client: EC2MetadataRequestError: failed to get  │
│ EC2 instance identity document                                                                                                                                              │
│ caused by: RequestError: send request failed                                                                                                                                │
│ caused by: Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awa │
│ iting headers)             

Without the network policy blocking access to the EC2 metadata endpoint pod assumes the role of the worker node and then complaints because it does not have access to cloudwatch:

...
│ E1029 10:06:08.331326       1 publisher.go:173] Unable to publish CloudWatch metrics: AccessDenied: User: arn:aws:sts::111111111111:assumed-role/cluster-worker │
│ 123050886700000005/i-04eXXXXXXX22 is not authorized to perform: cloudwatch:PutMetricData  

@mogren
Copy link
Contributor

mogren commented Nov 5, 2019

@miguelaferreira Oh, did you add that permission though? It's not available in the managed CNI policy by default. See https://docs.aws.amazon.com/eks/latest/userguide/cni-metrics-helper.html#install-metrics-helper for details

@miguelaferreira
Copy link
Author

miguelaferreira commented Nov 5, 2019

@mogren I'm not sure what permission you are referring to. But if that's the policy to allow the pod to call cloudwatch:PutMetricData, then yes I have put that policy in a role that I assign to the SA that runs the pod (according to instructions here https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html).

Using the role for the SA I have to block the access to the EC2 metadata, otherwise the pod assumes the role of the worker node (arn:aws:sts::111111111111:assumed-role/cluster-worker-123050886700000005/i-04eXXXXXXX22) which is not allowed to call cloudwatch:PutMetricData. However, when I block the access to the EC2 metadata (and the pod assumes the correct role that is allowed to call cloudwatch:PutMetricData) then the pod complains about not being able to reach the EC2 metadata endpoint.

Does that clarify the problem?

@mogren
Copy link
Contributor

mogren commented Nov 5, 2019

Ah, thanks @miguelaferreira for the explanation. This requires some more work from our side.

@miguelaferreira
Copy link
Author

@mogren is there any progress towards supporting running the metrics helper with IAM roles mapped to k8s service accounts?

@mogren
Copy link
Contributor

mogren commented Jan 15, 2020

@miguelaferreira Sorry, not yet, but thanks for pinging me about it. Similar changes should be done to the ipamd pod (aws-node) as well.

@miguelaferreira
Copy link
Author

@mogren I was checking back on this issue when I re-read your comment. I'm not sure what needs to change in the ipamd pod but I can confirm it works perfectly with IAM roles mapped to k8s service accounts. I have the metadata endpoint blocked on my cluster and the ipamd pods are using the role I assign to them.

# extract of aws-node pod manifest
   containers:
    - env:
      - name: AWS_VPC_K8S_CNI_LOGLEVEL
        value: DEBUG
      - name: MY_NODE_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: spec.nodeName
      - name: AWS_ROLE_ARN
        value: arn:aws:iam::1234567890:role/kube-system-aws-node
      - name: AWS_WEB_IDENTITY_TOKEN_FILE
        value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token

@jaypipes
Copy link
Contributor

@mogren I was checking back on this issue when I re-read your comment. I'm not sure what needs to change in the ipamd pod but I can confirm it works perfectly with IAM roles mapped to k8s service accounts. I have the metadata endpoint blocked on my cluster and the ipamd pods are using the role I assign to them.

# extract of aws-node pod manifest
   containers:
    - env:
      - name: AWS_VPC_K8S_CNI_LOGLEVEL
        value: DEBUG
      - name: MY_NODE_NAME
        valueFrom:
          fieldRef:
            apiVersion: v1
            fieldPath: spec.nodeName
      - name: AWS_ROLE_ARN
        value: arn:aws:iam::1234567890:role/kube-system-aws-node
      - name: AWS_WEB_IDENTITY_TOKEN_FILE
        value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token

@miguelaferreira Have you applied the same above changes (for AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE) to the CNI metrics helper Deployment.template.spec?

@miguelaferreira
Copy link
Author

@jaypipes I'm not sure I understand what you are asking. But the way I have been doing this is to annotate a service account and then the pod spec gets extended with these extra env vars. I have done this consistently with several deployments in my cluster.

@jaypipes
Copy link
Contributor

@jaypipes I'm not sure I understand what you are asking. But the way I have been doing this is to annotate a service account and then the pod spec gets extended with these extra env vars. I have done this consistently with several deployments in my cluster.

@miguelaferreira yes, sorry for being unclear.

@mogren I believe I have found the source of this problem.

Note that the CNI metrics helper instantiates the AWS SDK Session object differently than ipamd.

Here is the CNI metrics helper instantiating its Publisher's session:

awsSession := session.Must(session.NewSession())
// Get cluster-ID
ec2Client, err := ec2wrapper.NewMetricsClient()
if err != nil {
return nil, errors.Wrap(err, "publisher: unable to obtain EC2 service client")
}
clusterID := getClusterID(ec2Client)
// Get CloudWatch client
ec2MetadataClient := ec2metadatawrapper.New(nil)
region, err := ec2MetadataClient.Region()
if err != nil {
return nil, errors.Wrap(err, "publisher: Unable to obtain region")
}
cloudwatchClient := cloudwatch.New(awsSession, aws.NewConfig().WithMaxRetries(cloudwatchClientMaxRetries).WithRegion(region))

and here is where the Metrics client ends up instantiating its session:

metricsSession := session.Must(session.NewSession())
ec2MetadataClient := ec2metadatawrapper.New(nil)
instanceIdentityDocument, err := ec2MetadataClient.GetInstanceIdentityDocument()
if err != nil {
return &EC2Wrapper{}, err
}
ec2ServiceClient := ec2.New(metricsSession, aws.NewConfig().WithMaxRetries(maxRetries).WithRegion(instanceIdentityDocument.Region))

Note that in the latter case, we call GetInstanceIdentityDocument(), which is defined here:

https://github.com/aws/aws-sdk-go/blob/e80315117c6955364974702b89f67d6f0a7247e3/aws/ec2metadata/api.go#L103

which queries IMDS for the instance-identity/document path.

I think something to do with GetInstanceIdentityDocument() and the different between the publisher and metrics client is the source of the issue here.

/cc @micahhausler

@jayanthvn
Copy link
Contributor

As @jaypipes mentioned we need to look at the IPAMD and metrics helper code to understand how the session is setup. That should clarify why the behavior is different.

@janbeerden
Copy link

Seems related to 1287. Any updates?

@kumarpmd
Copy link

@jayanthvn Please let us know if there are any updates to this issue.

EKS cluster deployed with v1.9.0 amazon-k8s-cni and pods blocked from IMDS. cni-metrics-helper deploys, but reports failure for EC2MetadataRequstError.

cni-metrics-helper will be useful to monitor the enis and ips associated with EKS deployment. Thank you.

E1213 22:07:27.596576       1 cni-metrics-helper.go:99] Failed to create publisher: publisher: unable to obtain EC2 service client: EC2MetadataRequestError: failed to get EC2 instance identity document

caused by: RequestError: send request failed
caused by: Get http://169.254.169.254/latest/dynamic/instance-identity/document: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

@jayanthvn
Copy link
Contributor

@kumarpmd - This will be #1715 released as part of 1.10.2 and should fix your issue. We are working on the release.

@cgchinmay
Copy link
Contributor

cgchinmay commented Dec 14, 2021

@kumarpmd Hi, we have released a private image to test this fix. Could you try with this cni-metrics-helper image tag: v1.10.2-rc1 ? You will also need to specify AWS_CLUSTER_ID as below and use an IRSA so that Region field is auto-injected

# Source: cni-metrics-helper/templates/deployment.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  name: cni-metrics-helper
  namespace: kube-system
  labels:
    k8s-app: cni-metrics-helper
spec:
  selector:
    matchLabels:
      k8s-app: cni-metrics-helper
  template:
    metadata:
      labels:
        k8s-app: cni-metrics-helper
    spec:
      containers:
      - env:
        - name: USE_CLOUDWATCH
          value: "true"
        - name: AWS_CLUSTER_ID
          value: "test-cluster"  
        name: cni-metrics-helper
        image: "<ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com/cni-metrics-helper:v1.10.2-rc1"
      serviceAccountName: xxxx

Refer this readme from above linked PR for guidance: https://github.com/aws/amazon-vpc-cni-k8s/blob/af780320a81bab5fca5473ce22c61965aa18141a/cmd/cni-metrics-helper/README.md

@kumarpmd
Copy link

Thank you, @cgchinmay. cni-metrics-helper:v1.10.2-rc1 with the AWS_CLUSTER_ID was deployed in a cluster with pods blocked from IMDS, and cni-helper was able to generate metrics.. Thank you again.

Should the version of vpc-cni match cni-metrics-helper? Tested cni-metrics-helper:v1.10.2-rc1 with vpc-cni v1.9.0 in eks 1.20, and cni's svc account aws-node. Just deployed role and bindings for aws-node. Let me know if this is of concern.

@cgchinmay
Copy link
Contributor

Thanks for confirming @kumarpmd , no you dont need to. Changing only cni-metrics-helper should be enough.

@jayanthvn
Copy link
Contributor

PR 1715 is merged and will be part of 1.10.2 release. EKS documentation will updated after release and GitHub readme is updated.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants