-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow in-place upgrades of ES domains #6243
Conversation
1a1a6f3
to
560a5ae
Compare
560a5ae
to
e987101
Compare
I think this is probably fine for a first pass but longer term there needs to be a configurable timeout block for Elasticsearch domains as I'm wary that upgrades of large ES domains could take a fair bit longer than 60 minutes and we should also think about validating the upgrade path using Not sure if this needs documenting anywhere but if so we should probably at least link to the valid upgrade paths documentation. |
e987101
to
10ee387
Compare
AWS now allows in place upgrades of some version of ES domains. Currently silently passes without making any changes with an invalid upgrade path. For the list of all valid upgrade paths see https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-version-migration.html
10ee387
to
1000d84
Compare
I was messing around with |
I'm not sure it's necessary to be have the migration path detection initially but it would be a welcome addition to this PR. Unfortunately after cherry picking your change across onto my work it doesn't seem to work and the plan is always showing it can upgrade in place:
I'm finished for the day now so can't take a longer look until probably next week if you're still stuck but if you do get it working I'd be more than happy to cherry pick your commits on to this change. |
I just updated that commit based on your feedback and forked your branch and it appears to be working as expected. https://github.com/rifelpet/terraform-provider-aws/tree/es_upgrade
Agreed that it would be a nice to have and not necessary for the initial implementation |
Awesome, thanks @rifelpet, had a quick play with the branch and it seems to work now. I've left some feedback on your commit rifelpet@0ad0cc7 that it would be cool to see if you agree with it or not. I think we can probably add your changes on to this branch as they do simplify things a bit and make it a more obvious way to look at upgrade paths from a user perspective. I'd also like to think about the acceptance test(s) a little bit more and if we can test that it actually does do in place/not in place upgrades for expected paths as right now all it proves is that the version does in fact change but doesn't tell you if it did an in place upgrade or destroyed and rebuilt the ES domain. @bflad is the acceptance test issue mentioned above something you'd want to see added before this is merged? Without spending much time thinking about it I'm not sure what the best way is to detect this right now but can take a longer look at it or if you happen to know of a good way to do that then a pointer in the right direction would be much appreciated. I know the |
@tomelliff generally by preference, we can check if:
For Elasticsearch domains, it looks like Here's an example implementation of one: |
I can't see anything that exposes anything unique in the calls we're making right now but we could look at the Want to block this until I've had a chance to work through necessary changes for the tests? |
@tomelliff I addressed most of your feedback in a new commit to my branch. I can squash it when we're ready. Regarding returning an error if the GetCompatibleElasticsearchVersions call fails, I'm not sure we can do that. The function signature doesn't return an error type, and none of the other instances of Regarding the GetCompatibleElasticsearchVersions response: If a version has no compatible upgrades, it returns an empty array. This is a 2.3 cluster: {
"CompatibleElasticsearchVersions": []
} If it does have compatible upgrades, the response looks like: {
"CompatibleElasticsearchVersions": [
{
"SourceVersion": "5.5",
"TargetVersions": [
"5.6"
]
}
]
} I believe |
Derp, misread the thing you were length checking, that's fine, sorry. Changes look good to me if you want to squash and I'll cherry pick those changes on to this branch and actually push it up to my fork. |
I squashed my branch, feel free to cherry pick the commit. |
@rifelpet done, thanks for that, really nice addition. I'll take a look at the acceptance test(s) to test when we do/don't delete the cluster when updating the version when I get a chance later this week. |
Do we know how terraform will behave if an upgrade takes longer than the 1h timeout? Will it try to re-upgrade, which will presumably fail? There is a Most of the clusters I manage with terraform are over a terrabyte of data so I think I'll be hitting the timeout but I wont be able to test it until this gets merged and released. |
I don't have a good way to test that without writing a ton of useless data to an index that I'm happy to murder during testing but if you could snap an existing large cluster and restore it into another domain that you are happy to play with that would be good. I think Terraform will time out and throw an error. If the user then attempts to run an upgrade while the existing upgrade is in process then the AWS API will throw an error. If they wait until the upgrade is complete then the refresh will then show nothing to do on the plan. Probably worth checking that assumption though and yeah we do need to think about configurable timeouts but I'd rather leave that to a separate, smaller PR for now to minimise the change scope. |
@bflad what work is remaining to get this merged? |
I need to make the acceptance test actually verify we didn't delete the ES cluster when upgrading. It requires a bit of rework of how we're testing things and unfortunately I've only had time to contribute to open source projects on my commute where I don't have a stable enough internet connection to be building and deleting ES clusters because they take so damn long to apply and destroy. Hopefully I'll get some time over the Christmas break or possibly even this weekend. |
Acceptance test after most recent commits:
|
…supported upgrade path Not currently checking that unsupported upgrade paths leads to them being recreated. Unfortunately we can't just switch all of the tests and helpers over to using the ElasticSearchDomainConfig because the testAccLoadESTags helper needs the ARN which isn't available or easily retrievable when using the ElasticSearchDomainConfig struct.
This PR was originally created before the move to parallel acceptance tests so needed fixing up.
6460ecf
to
ca2aa5d
Compare
Sigh. Just realised I was comparing the same created time to itself by making a copy/paste error. Force pushed fix back over the top and rerunning the insanely long acceptance test now. |
Acceptance test completed fine:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @tomelliff and @rifelpet! LGTM 🚀
--- PASS: TestAccAWSElasticSearchDomain_encrypt_at_rest_default_key (753.89s)
--- PASS: TestAccAWSElasticSearchDomain_importBasic (762.44s)
--- PASS: TestAccAWSElasticSearchDomain_LogPublishingOptions (819.86s)
--- PASS: TestAccAWSElasticSearchDomain_policy (825.95s)
--- PASS: TestAccAWSElasticSearchDomain_v23 (868.39s)
--- PASS: TestAccAWSElasticSearchDomain_complex (987.62s)
--- PASS: TestAccAWSElasticSearchDomain_encrypt_at_rest_specify_key (1144.27s)
--- PASS: TestAccAWSElasticSearchDomain_vpc (1226.03s)
--- PASS: TestAccAWSElasticSearchDomain_NodeToNodeEncryption (1317.03s)
--- PASS: TestAccAWSElasticSearchDomain_duplicate (1478.16s)
--- PASS: TestAccAWSElasticSearchDomain_basic (1617.45s)
--- PASS: TestAccAWSElasticSearchDomain_tags (1671.03s)
--- PASS: TestAccAWSElasticSearchDomain_internetToVpcEndpoint (2259.24s)
--- PASS: TestAccAWSElasticSearchDomain_CognitoOptionsCreateAndRemove (2278.74s)
--- PASS: TestAccAWSElasticSearchDomain_update (2343.72s)
--- PASS: TestAccAWSElasticSearchDomain_vpc_update (2650.67s)
--- PASS: TestAccAWSElasticSearchDomain_CognitoOptionsUpdate (2894.05s)
--- PASS: TestAccAWSElasticSearchDomain_update_volume_type (3670.57s)
--- PASS: TestAccAWSElasticSearchDomain_update_version (3812.46s)
--- PASS: TestAccAWSElasticSearchDomain_withDedicatedMaster (3842.67s)
This has been released in version 1.55.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
AWS has announced support for in place upgrades of some version of ES domains.
Currently silently passes without making any changes with an invalid upgrade path.
A refresh will show it with the correct, unaltered version on the next plan/apply.
For the list of all valid upgrade paths see https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-version-migration.html
Fixes #5554
Results from running the included acceptance test:
So very slow 😢