Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws: add support for EKS Pod Identity #9696

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

zhihonl
Copy link

@zhihonl zhihonl commented Dec 6, 2024

Fluent Bit will not publish telemetry to CloudWatch when only EKS pod identity credential is present. This is because it does not use EKS pod identity as part of its credential provider chain.

We are making a change to Fluent Bit's AWS HTTPS credential functions to use the token provided in environment variable AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE which is injected by EKS Pod Identity webhook to query endpoint specified in AWS_CONTAINER_CREDENTIALS_FULL_URI environment variable to retrieve STS credentials.

This PR is a replica of #9206 but merging the changes from @PettitWesley and @edsiper and also making the post-merge changes compatible with master branch.

Address #8550


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
    Used default configuration provided by AWS CloudWatch Observability add-on
[INPUT]
  Name                tail
  Tag                 application.*
  Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
  Path                /var/log/containers/*.log
  multiline.parser    docker, cri
  DB                  /var/fluent-bit/state/flb_container.db
  Mem_Buf_Limit       50MB
  Skip_Long_Lines     On
  Refresh_Interval    10
  Rotate_Wait         30
  storage.type        filesystem
  Read_from_Head      ${READ_FROM_HEAD}

[INPUT]
  Name                tail
  Tag                 application.*
  Path                /var/log/containers/fluent-bit*
  multiline.parser    docker, cri
  DB                  /var/fluent-bit/state/flb_log.db
  Mem_Buf_Limit       5MB
  Skip_Long_Lines     On
  Refresh_Interval    10
  Read_from_Head      ${READ_FROM_HEAD}

[INPUT]
  Name                tail
  Tag                 application.*
  Path                /var/log/containers/cloudwatch-agent*
  multiline.parser    docker, cri
  DB                  /var/fluent-bit/state/flb_cwagent.db
  Mem_Buf_Limit       5MB
  Skip_Long_Lines     On
  Refresh_Interval    10
  Read_from_Head      ${READ_FROM_HEAD}

[FILTER]
  Name                aws
  Match               application.*
  ec2_instance_id     false
  az                  false

[FILTER]
  Name                kubernetes
  Match               application.*
  Kube_URL            https://kubernetes.default.svc:443
  Kube_Tag_Prefix     application.var.log.containers.
  Merge_Log           On
  Merge_Log_Key       log_processed
  K8S-Logging.Parser  On
  K8S-Logging.Exclude Off
  Labels              Off
  Annotations         Off
  Use_Kubelet         On
  Kubelet_Port        10250
  Buffer_Size         0
[OUTPUT]
  Name                cloudwatch_logs
  Match               application.*
  region              ${AWS_REGION}
  log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
  log_stream_prefix   ${HOST_NAME}-
  auto_create_group   true
  extra_user_agent    container-insights
  • Debug log output from testing the change

Log output after adding proper IAM permissions to role associated with pod identity

│ [2024/12/06 20:17:53] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] cloudwatch:PutLogEvents: events=1210, payload=1046756 bytes                                                                          │
│ [2024/12/06 20:17:53] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending log events to log stream ip-192-168-25-231.ec2.internal-application.var.log.containers.fluent-bit-xhhnj_amazon-cloudwatch_fl │
│ [2024/12/06 20:17:53] [debug] [upstream] KA connection #218 to logs.us-east-1.amazonaws.com:443 has been assigned (recycled)                                                                                  │
│ [2024/12/06 20:17:53] [debug] [http_client] not using http_proxy for header                                                                                                                                   │
│ [2024/12/06 20:17:53] [debug] [aws_credentials] Retrieving credentials from the HTTP provider..                                                                                                               │
│ [2024/12/06 20:17:53] [debug] [upstream] KA connection #220 to logs.us-east-1.amazonaws.com:443 is now available                                                                                              │
│ [2024/12/06 20:17:53] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] PutLogEvents http status=200                                                                                                         │
│ [2024/12/06 20:17:53] [debug] [upstream] KA connection #62 to logs.us-east-1.amazonaws.com:443 is now available                                                                                               │
│ [2024/12/06 20:17:53] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] PutLogEvents http status=200                                                                                                         │
│ [2024/12/06 20:17:53] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Using stream=ip-192-168-25-231.ec2.internal-application.var.log.containers.cloudwatch-agent-hhxsw_amazon-cloudwatch_otc-container-e5 │
│ [2024/12/06 20:17:53] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Using stream=ip-192-168-25-231.ec2.internal-application.var.log.containers.cloudwatch-agent-hhxsw_amazon-cloudwatch_otc-container-e5 │
│ [2024/12/06 20:17:53] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Using stream=ip-192-168-25-231.ec2.internal-application.var.log.containers.cloudwatch-agent-hhxsw_amazon-cloudwatch_otc-container-e5 │
│ [2024/12/06 20:17:53] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] cloudwatch:PutLogEvents: events=3, payload=3462 bytes                                                                                │
│ [2024/12/06 20:17:53] [debug] [output:cloudwatch_logs:cloudwatch_logs.0] Sending log events to log stream ip-192-168-25-231.ec2.internal-application.var.log.containers.cloudwatch-agent-hhxsw_amazon-cloudwa │
│ [2024/12/06 20:17:53] [debug] [upstream] KA connection #216 to logs.us-east-1.amazonaws.com:443 has been assigned (recycled

Log output after removing proper IAM permissions to role associated with pod identity

│ [2024/12/06 20:22:16] [debug] [aws_credentials] Retrieving credentials from the HTTP provider..                                                                                                               │
│ [2024/12/06 20:22:16] [debug] [http_client] server logs.us-east-1.amazonaws.com:443 will close connection #221                                                                                                │
│ [2024/12/06 20:22:16] [debug] [aws_client] logs.us-east-1.amazonaws.com: http_do=0, HTTP Status: 400                                                                                                          │
│ [2024/12/06 20:22:16] [debug] [output:cloudwatch_logs:cloudwatch_logs.2] PutLogEvents http status=400                                                                                                         │
│ [2024/12/06 20:22:16] [error] [output:cloudwatch_logs:cloudwatch_logs.2] PutLogEvents API responded with error='AccessDeniedException'                                                                        │
│ [2024/12/06 20:22:16] [error] [output:cloudwatch_logs:cloudwatch_logs.2] Failed to send log events                                                                                                            │
│ [2024/12/06 20:22:16] [error] [output:cloudwatch_logs:cloudwatch_logs.2] Failed to send log events                                                                                                            │
│ [2024/12/06 20:22:16] [error] [output:cloudwatch_logs:cloudwatch_logs.2] Failed to send events
  • Attached Valgrind output that shows no leaks or memory corruption was found
    Ran valgrind against HTTP credentail tests

Command

valgrind --leak-check=full bin/flb-it-aws_credentials_http

Output

SUCCESS: All unit tests have passed.
==11448==
==11448== HEAP SUMMARY:
==11448==     in use at exit: 0 bytes in 0 blocks
==11448==   total heap usage: 2 allocs, 2 frees, 1,200 bytes allocated
==11448==
==11448== All heap blocks were freed -- no leaks are possible
==11448==
==11448== For lists of detected and suppressed errors, rerun with: -s
==11448== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

PettitWesley and others added 7 commits August 13, 2024 21:00
    This change brings the http credential provider
    in line with the latest spec and adds support for:
    - EKS Pod Identity
      - validate/support EKS credential link local IP 169.254.170.23
    - Latest HTTP Provider spec:
      - AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
      - AWS_CONTAINER_CREDENTIALS_FULL_URI
      - AWS_CONTAINER_AUTHORIZATION_TOKEN
      - AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE

Signed-off-by: Wesley Pettit <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants