Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(issue-4448): aws route53 inconsistent domain name handling - octal escapes #4582

Merged
merged 27 commits into from
Sep 17, 2024

Conversation

ivankatliarchuk
Copy link
Contributor

@ivankatliarchuk ivankatliarchuk commented Jun 29, 2024

Description

External DNS doesn't fully decode domain names retrieved from AWS Route 53, potentially leading to issues with mismatched characters due to unhandled octal escape sequences.

aws route53 list-resource-record-sets --hosted-zone-id ASDFASFASFASDF
{
    "ResourceRecordSets": [
        {
            "Name": "wiremock-\\045\\041s\\050\\074nil\\076\\051-internal-eks--internal-eks.example.com.",
            "Type": "CNAME",
            "TTL": 60,
            "ResourceRecords": [
                {
                    "Value": "eks.example.com"
                }
            ]
        },
        {
            "Name": "x-cname-vault.example.com.",
            "Type": "TXT",
            "TTL": 300,
            "ResourceRecords": [
                {
                    "Value": "\"heritage=external-dns,external-dns/owner=extdns,external-dns/resource=crd/vault/vault.example.com\""
                }
            ]
        },
        {
            "Name": "x-cname-wiremock-\\045\\041s\\050\\074nil\\076\\051-internal-eks--internal-eks.example.com.",
            "Type": "TXT",
            "TTL": 300,
            "ResourceRecords": [
                {
                    "Value": "\"heritage=external-dns,external-dns/owner=extdns,external-dns/resource=crd/apps/wiremock.example.com\""
                }
            ]
        },
        {
            "Name": "x-vault.example.com.",
            "Type": "TXT",
            "TTL": 300,
            "ResourceRecords": [
                {
                    "Value": "\"heritage=external-dns,external-dns/owner=extdns,external-dns/resource=crd/vault/vault.example.com\""
                }
            ]
        },
        {
            "Name": "x-wiremock-\\045\\041s\\050\\074nil\\076\\051-internal-eks--internal-eks.example.com.",
            "Type": "TXT",
            "TTL": 300,
            "ResourceRecords": [
                {
                    "Value": "\"heritage=external-dns,external-dns/owner=extdns,external-dns/resource=crd/apps/wiremock.example.com\""
                }
            ]
        }
    ]
}

Record actually created. Rollback does not help. External dns swallow this poison..... crashloopbackoffff....
The only way to recover external-dns is to remove records from Route53 with command aws route53 change-resource-record-sets --hosted-zone-id $HOSTED_ZONE_ID --change-batch file:///delete.json and from kubernetes

The issue arises from how the AWS Route 53 API handles domain names with special characters. According to the AWS Route 53 Developer Guide on Domain Name Format, the API escapes special characters into their ASCII representations using three-digit octal codes.

When the AWS Route 53 API returns domain names containing special characters, these characters are escaped into their octal ASCII format. However, the current implementation in the external-dns codebase does not correctly handle this conversion. As a result, the domain names returned by the API are not properly converted back to their original format, leading to discrepancies and potential issues in the functionality relying on these domain names.

For example, a domain name such as wiremock-%!s(<nil>) with a special character would be returned as wiremock-\\045\\041s\\050\\074nil\\076\\051 by the API. The current code does not decode octals \\050 back to ), thus causing an incorrect representation of the domain name.

To resolve this issue, the code needs to be updated to include a mechanism that correctly decodes these escaped special characters back into their original ASCII format.

Original name is 48 characters long x-wiremock-%!s(<nil>)-internal-eks--internal-eks
when encoded is 64 e.g. wiremock-\\045\\041s\\050\\074nil\\076\051-internal-eks--internal-eks

Fixes #4448

Expected behavior:

  • do not crash extearnal dns

Tested on AWS EKS cluster 1.26 and 1.28

Checklist

  • Unit tests updated
  • Tested from local against real AWS EKS cluster
  • End user documentation updated

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 29, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @ivankatliarchuk. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 29, 2024
@ivankatliarchuk ivankatliarchuk changed the title fix(issue-4448): make sure nulls not propagated WIP fix(issue-4448): make sure nulls not propagated Jun 29, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 29, 2024
@ivankatliarchuk ivankatliarchuk marked this pull request as draft June 29, 2024 12:23
@ivankatliarchuk ivankatliarchuk changed the title WIP fix(issue-4448): make sure nulls not propagated fix(issue-4448): make sure nulls not propagated Jun 29, 2024
Signed-off-by: ivan katliarchuk <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 29, 2024
@ivankatliarchuk ivankatliarchuk changed the title fix(issue-4448): make sure nulls not propagated fix(issue-4448): handle null cases Jun 29, 2024
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 30, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 30, 2024
@ivankatliarchuk ivankatliarchuk changed the title fix(issue-4448): handle null cases fix(issue-4448): route53 handle domain names with special characters escaped to octal Jun 30, 2024
@ivankatliarchuk ivankatliarchuk changed the title fix(issue-4448): route53 handle domain names with special characters escaped to octal fix(issue-4448): aws route53 inconsistent domain name handling - octal escapes Jun 30, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 30, 2024
@ivankatliarchuk ivankatliarchuk marked this pull request as ready for review June 30, 2024 12:43
@mloiseleur
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 9, 2024
@ivankatliarchuk
Copy link
Contributor Author

When can we expect this to be released or added to the next release milestone?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 7, 2024
provider/aws/aws.go Outdated Show resolved Hide resolved
@szuecs
Copy link
Contributor

szuecs commented Sep 10, 2024

A small comment and please rebase, then I am happy to approve this PR.

ivankatliarchuk and others added 4 commits September 14, 2024 09:56
Signed-off-by: ivan katliarchuk <[email protected]>
* master: (78 commits)
  Update README.md with Efficient IP Provider
  feat(chart): Updated image to v0.15.0
  fix(chart): Don't use unauthenticated webhook port for health probe
  Remove unused session logic after move to aws-sdk-go-v2
  Refactor AWS provider to aws-sdk-go-v2
  Refactor AWS Cloud Map provider to aws-sdk-go-v2
  Refactor DynamoDB registry to aws-sdk-go-v2
  Update docs/release.md
  update the docs to v0.15.0
  bump kustomize version to v0.15.0
  add deprecation notice on coredns tutorial
  docs: refactor title and organisation
  review with Raffo
  chore: remove unmaintained providers
  chore(deps): bump actions/setup-python in the dev-dependencies group
  Add RouterOS provider to README.md
  feat: add annotation and label filters to Ambassador Host Source (kubernetes-sigs#2633)
  chore(deps): bump GrantBirki/json-yaml-validate
  fix linter
  fix ordering
  ...
Signed-off-by: ivan katliarchuk <[email protected]>
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 14, 2024
Signed-off-by: ivan katliarchuk <[email protected]>
Signed-off-by: ivan katliarchuk <[email protected]>
Signed-off-by: ivan katliarchuk <[email protected]>
@ivankatliarchuk
Copy link
Contributor Author

ivankatliarchuk commented Sep 14, 2024

Rebased and removed whitespace changes to licence block . Lost ltgm label, do you I need to request of /lgtm again?

@mloiseleur
Copy link
Contributor

/lgtm
I'll let @szuecs do the final approve. Thanks @ivankatliarchuk !

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 14, 2024
@szuecs
Copy link
Contributor

szuecs commented Sep 17, 2024

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: szuecs

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 17, 2024
@szuecs
Copy link
Contributor

szuecs commented Sep 17, 2024

Thank you!

@k8s-ci-robot k8s-ci-robot merged commit 05cd406 into kubernetes-sigs:master Sep 17, 2024
13 checks passed
@ivankatliarchuk ivankatliarchuk deleted the fix-crash-loop branch September 19, 2024 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

External-DNS crash loop due to nil error in DNSEntry object
4 participants