-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some aws_security_group_rules are not added to tfstate file. #2584
Comments
Here are some nonexhaustive logs extracted from terraform for a failing rule called "egress_nat_http_to_all"
|
Some more details about the security_group that failed to get added to tfstate above:
|
Here is the rule as configured in terraform:
And the security group it relates to:
|
If you would like to see anything else, please let me know |
I'm seeing this exact same issue when using |
In our case, its worth noting that the security_group is managed by Terraform, but glad someone else is seeing this as it is causing us huge slowdown, and preventing us from spinning up new infrastructure regularly right now. |
I just bumped into the same thing, with a security group not managed by terraform. |
Sorry for the trouble here folks. Tagging and we'll get this looked at and fixed before 0.6.1. |
I've been starting to learn go and develop my own version of terraform to try and help you guys tackle this. Based on commit - ab0a7d8 - which I think is very new, I now have a minimal set of security group rules. When I randomly destroy and create - sometimes the tfstate file is fine, and a single terraform apply works - sometimes it gets itself in a twist. I have 24 aws_security_group_rules applied across 6 aws_security_groups. I introduced some debug to log the result of ipPermissionIDHash on each of these newly created resources. I noticed out of 24 aws_security_group_rules, 2 have the same ipPermissionIDHash. But only on this run (last time they were all unique). Not sure whats going on there, as I havent modified the rules in anyway, but may be a race condition? In the state file, 2 of the rules have now ended up with the same ipPermissionIDHash as follows:
Not sure if this is helpful, but might be a start. You'll notice they depend on different security groups, but have ended up being applied to the same security group. Maybe thats it? This isn't quite the same as what I experienced at first which is the security group rules were omitted form the tfstate file altogether, but I am hunting for all the problems with this. |
So problem 1 - is as you can see the security group ID is not being considered as part of the ipPermissionIDHash - so the same rule on different security groups will get the same resource ID.
In the latest run however, I got 8 missing rules in tfstate, I am now analysing problem 2 |
Problem 2 is the one described by this ticket, and you can see it in an example run here- All ipPermissionIDHashes were generated (24 of them), only 2 matches because of problem 1 above, but 8 didnt make it into the tfstate file - see as follows:
|
Thanks for all this context. This will all help but we're still going to need a reproduction Terraform config I can run locally in order to get this fixed. Can you make one? Thanks! |
@mitchellh we've seen this occasionally too, I'll take a crack at creating a small config that reproduces now. |
Had a difficult time trimming our configuration down, but still reproducing. If you have a smaller config, could you post yours @gtmtech ? |
With this code I can reproduce this bug by cycling |
Was able to reproduce this with the following config which is pretty small:
Ran Error message:
|
Do you still need an example from me? |
👍 seeing this same issue in 0.6.1. |
+1 that this is a serious issue, we are hitting issues where rules that are created successfully are not making it into the state file and subsequent runs fail as it tries to create existing rule |
Thanks for the example @jszwedko - we'll use this to reproduce on our side, investigate, and follow up with what we find. |
Thanks for all the additional info everyone. I'm taking a fresh look at this now, sorry for all the trouble 😦 |
Hey @jszwedko – I believe that specific issue is a race condition with the AWS API. I reconstructed your example in this repo: I ran the snippet you shared (the Here's a gist of at lest 3 successful runs: If you could take a look at the repo above, maybe you can spot something I missed? That all said, I'm still looking into this issue. I believe there is a lingering issue with Security Group Rules that I've yet to nail down. Thanks! |
I thought that originally @catsby, but the rules end up created fine in AWS, they just dont appear in terraform tfstate file - meaning terraform has an inconsistent view of whats actually there in AWS. So we do a single run of terraform, creating a bunch of rules. And what gets created is EXACTLY what we specified in terraform, but whats in the tfstate file is missing some of the security group rules. So I dont believe it can possibly be an AWS problem, I think its a terraform problem. Our security groups and security group rules are now big enough that this happens on EVERY terraform run. Its a serious issue for us. |
@catsby hmm, the symptoms are the same (the rules are created in AWS, just not recorded in the state file). I'm happy to open this as a separate issue if you think that is different though. |
I believe both issues will be addressed in an upcoming pull request that I'm working on. Security Group Rules are getting an update on how they're found and read from the API. It's a bit tricky because of how AWS groups the cidr blocks but I'm working it out. |
Hey @gtmtech – do you have a stripped down example of rules not being saved to the state file? Thanks! |
I just sent #3019 as a patch for some of the issues reported here. As mentioned above, there is a legitimate issue where Security Group Rules would fail to save correctly, both for the same Security Group, or the same rule applied to multiple Groups. Those issues should be fixed in #3019 , but I need help reviewing and vetting that. There is another issue demonstrated here, which appears to be a race condition with the AWS API, and it's eventually consistent nature. That I do not attempt to fix in #3019, and I don't believe there's much we can do about it. If possible please checkout the PR and let me know! Thanks all for the help here, sorry again for the delay |
Sorry, been sending github mails to junk and didnt see the progress on this - do you still want help testing ? |
Sorry to say this isn't fixed. It did actually look a bit better - I usually get 20 rules which missed going into the state file, and on the latest run I got only 1. However as its a race condition, I cant be sure, I can only say its still not totally fixed. $ terraform version |
Thanks @gtmtech , I'm glad it seems to have at least improved. I do have some questions, if you don't mind:
Sorry for the barrage of questions, I'm still trying to hammer all the bugs out here :/ |
Yep to answer your question, the error message is exactly the same as before... there are no errors on the first apply - all rules get created in AWS, but on the second apply, there is one or two rules that attempt to apply (as they didn't get saved into the tfstate file), but as they are already in AWS, the error is the duplicate rule error that I first referenced at the top of this issue - i.e.:
Yes they get added to AWS (hence them erroring as Duplicate the next apply
Every time I've reterraformed my env, I get the problem yes
Every time it is random which rules dont get saved - sometimes one, sometimes two, different rules each time it seems to me.
Correct - terraform plan thinks it needs to add the one or two rules that didnt make it into the tfstate file (but are in AWS, hence on apply it errors with Duplicate)
It errors with above error message saying there's duplicates. |
For those with flexibility on their overall security group design, here's a workaround while this bug is worked on. This workaround requires a specific security group design, but the benefit of the design is:
Every instance gets at least the following two security groups, each with a different purpose:
Here's a contrived example with a circular dependency. A docker registry machine provides a redis machine with access to pull the private redis docker image, while the redis machine provides access to the docker registry for caching image layers in redis. No need for any # docker registry
resource "aws_security_group" "dockerregistry-ingress" {
name = "dockerregistry-ingress"
description = "Docker Registry (Ingress)"
vpc_id = "${var.vpc_id}"
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
security_groups = ["${aws_security_group.redis-egress.id}"]
}
}
resource "aws_security_group" "dockerregistry-egress" {
name = "dockerregistry-egress"
description = "Docker Registry (Egress)"
vpc_id = "${var.vpc_id}"
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# redis
resource "aws_security_group" "redis-ingress" {
name = "redis-ingress"
description = "Redis (Ingress)"
vpc_id = "${var.vpc_id}"
ingress {
from_port = 6379
to_port = 6379
protocol = "tcp"
security_groups = ["${aws_security_group.dockerregistry-egress.id}"]
}
}
resource "aws_security_group" "redis-egress" {
name = "redis-egress"
description = "Redis (Egress)"
vpc_id = "${var.vpc_id}"
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
} Since moving to this design, it has made it much easier for us to identify which rules belong to which security groups, with less lines of code. Avoiding this bug was just a lucky bonus :) |
I created a new issue at #3498 but after more investigation, my issue appears to be the same as this. Every time, without fail, exactly the same 23 of my resources (I have more than that) always show as needing to be created in the plan. This is really impacting my ability to use Terraform now. Does anyone know if any progress has been made on this please? |
@catsby Awesome. Will give it a go later. When can we expect the next release? |
@joelmoss "soon" 😄 |
How did I know you'd say that ;) |
Ok, so just tried this on master and all good! thx loads |
We are still hitting quite hard the issue that security group rules are not getting recorded during creation. We get about 40-50% failure rate (of most recent 5 runs, 3 had the issue). Newest 0.6.14 version. I'm attaching two state files - one for a successful build, and two for a bad build - you can see that many rules themselves are missing. Should I file a full bug report or is that enough. If you point me in the right direction I can try to fix this myself, though I've never dabbled with go. Good creation: Bad creations: |
I'm still getting this issue... Let me know what I can provide that would be useful for you to fix this :) |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
We thought in terraform 0.6.0 #2366 might fix this issue, but we have tested on 0.5.3, and 0.6.0 and it is still broken.
We have a fairly large configuration full of aws_security_group_rules and sometimes multiple security_groups applied to aws_instances. We like having individual rules, rather than lots of rules in an aws_security_group, because they can be labelled, and there were previous bugs with aws_security_group on changing some of the rules, causing us to go down the aws_security_group_rules route. (We like managing the rules independently of the security group)
The problem is, on a fresh terraform apply, terraform reports that all aws_security_group_rules have been created, but some of them (a random selection each time) are not added to the tfstate file. This means that a further terraform plan yields further rules to be created, but because they do exist in Amazon, a further terraform apply does not work, as they come back with "duplicate rule".
I ran the whole thing in TF_LOG=debug mode, so have captured everything, and have tried to show the relevant bits here (as I dont want to share the entire config of what I'm doing), but the key facts are that each time a fresh terraform apply (from nothing) is done, a random set of rules fails to make it into the tfstate file.
I will shortly update this with the relevant snippets of logs/code etc.
The text was updated successfully, but these errors were encountered: