-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing calls to checkConflictingNodes() #915
Conversation
calico_node/startup/startup.go
Outdated
// we need to update it and check for conflicts. | ||
if node.Spec.BGP.IPv6Address == nil || !node.Spec.BGP.IPv6Address.IP.Equal(cidr.IP) { | ||
node.Spec.BGP.IPv6Address = cidr | ||
checkConflictingNodes(client, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you potentially call this twice - once for ipv4 and once for ipv6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gah you're right, which kinda defeats the purpose, will refactor it to only call once
calico_node/startup/startup.go
Outdated
@@ -84,14 +84,11 @@ func main() { | |||
node := getNode(client, nodeName) | |||
|
|||
// Configure and verify the node IP addresses and subnets. | |||
configureIPsAndSubnets(node) | |||
configureIPsAndSubnets(client, node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit - rather than pass the client here it might be nicer to make it so configureIPsAndSubnets
returns whether or not the an IP on the node changed, and we can use it below.
calico_node/startup/startup.go
Outdated
node.Spec.BGP.IPv4Address = cidr | ||
// We autodetected an IPv4 address, if we have no previous IP, or it is different than the previous IP, | ||
// we need to update it and check for conflicts. | ||
if node.Spec.BGP.IPv4Address == nil || !node.Spec.BGP.IPv4Address.IP.Equal(cidr.IP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this only checks the IP part of the CIDR, not the netmask.
I think we might want to set node.Spec.BGP.IPv4Address = cidr
always - it's possible that the IP stays the same but the subnet changes.
In the original issue, there was also a suggestion that we add a config option to disable the check entirely - @heschlie could you add that to this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments.
@heschlie we'll need a corresponding docs PR to document the new configuration option. Could you link to it here once you've created it? |
Does it need a separate PR or could/should it just be in this PR? |
@heschlie oops, forgot we're in the Calico repo now! It can/should share this PR. |
calico_node/startup/startup.go
Outdated
@@ -267,7 +274,11 @@ func configureIPsAndSubnets(node *api.Node) { | |||
adm := os.Getenv("IP_AUTODETECTION_METHOD") | |||
cidr := autoDetectCIDR(adm, 4) | |||
if cidr != nil { | |||
// We autodetected an IPv4 address so update the value in the node. | |||
// We autodetected an IPv4 address, if we have no previous IP, or it is different than the previous IP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't notice before, but this check is not quite sufficient as it only covers the auto-detection case.
I think we also need to handle the case when the IP address has been explicitly specified in the environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that's true, I'll add similar checks there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps easier to store the initial IPv4 and IPv6 addresses and do the comparison once at the end of the fn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is wrong now that we've moved the check outside of this branch - should probably just be reverted to what it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for a few other comments in this PR.
calico_node/startup/startup.go
Outdated
if node.Spec.BGP.IPv6Address == nil || !node.Spec.BGP.IPv6Address.IP.Equal(envIp6.IP) { | ||
checkConflicts = true | ||
} | ||
node.Spec.BGP.IPv6Address = envIp6 | ||
} | ||
validateIP(node.Spec.BGP.IPv6Address) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be nicer and less code to store off the original value at the top of the function, and compare it to the new value at the bottom?
e.g.
oldv4 := node.Spec.BGP.IPv4Address
oldv6 := ...
// Rest of function.
...
if node.Spec.BGP.IPv4Addres != oldv4 || node.Spec.BGP.IPv6Address != oldv6 {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup I think I suggested the same thing earlier :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, missed that :)
Just driving by.
6950940
to
468b7bd
Compare
@caseydavenport @robbrockbank I think this should be good to go now, docs for new env var are in and the |
calico_node/startup/startup.go
Outdated
@@ -274,7 +285,8 @@ func configureIPsAndSubnets(node *api.Node) { | |||
} | |||
} else { | |||
if ipv4Env != "" { | |||
node.Spec.BGP.IPv4Address = parseIPEnvironment("IP", ipv4Env, 4) | |||
envIp := parseIPEnvironment("IP", ipv4Env, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should revert this change - not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for IPv6 below.
@heschlie I think we need a unit test for this. |
calico_node/startup/startup.go
Outdated
if oldIpv4 == nil || !node.Spec.BGP.IPv4Address.IP.Equal(oldIpv4.IP) { | ||
checkConflicts = true | ||
} | ||
if oldIpv6 == nil || !node.Spec.BGP.IPv6Address.IP.Equal(oldIpv6.IP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an IPv4 deployment, oldIpv6 will always be nil every time we start, so we'll always perform the conflict check.
This should be:
if (oldIpv6 == nil && node.Spec.BGP.IPv6Address != nil) || (oldIpv6 != nil && !node.Spec.BGP.IPv6Address.IP.Equal(oldIpv6.IP))
or something like that
calico_node/startup/startup.go
Outdated
} | ||
validateIP(node.Spec.BGP.IPv6Address) | ||
} | ||
|
||
// Detect if we've seen the IP address change, and flag that we need to check for conflicting Nodes | ||
if oldIpv4 == nil || !node.Spec.BGP.IPv4Address.IP.Equal(oldIpv4.IP) { | ||
checkConflicts = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do you really need to define checkConflicts? You could just return true here.
} | ||
validateIP(node.Spec.BGP.IPv6Address) | ||
} | ||
|
||
// Detect if we've seen the IP address change, and flag that we need to check for conflicting Nodes | ||
if oldIpv4 == nil || !node.Spec.BGP.IPv4Address.IP.Equal(oldIpv4.IP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should log in these two branches.
@@ -16,6 +16,7 @@ The `calico/node` container is primarily configured through environment variable | |||
| IP6 | The IPv6 address for Calico will bind to. When specified, the address is saved in the [node resource configuration]({{site.baseurl}}/{{page.version}}/reference/calicoctl/resources/node) for this host, overriding any previously configured value. When omitted, if an address has not yet been configured in the node resource, IPv6 routing is not enabled. When omitted, if an IPv6 address has been previously configured in the node resource, IPv6 is enabled using the already configured address. | IPv6 | | | |||
| IP_AUTODETECTION_METHOD| The method to use to autodetect the IPv4 address for this host. This is only used when the IPv4 address is being autodetected. See [IP Autodetection methods](#ip-autodetection-methods) for details of the valid methods. | string | first-found | | |||
| IP6_AUTODETECTION_METHOD| The method to use to autodetect the IPv6 address for this host. This is only used when the IPv6 address is being autodetected. See [IP Autodetection methods](#ip-autodetection-methods) for details of the valid methods. | string | first-found | | |||
| DISABLE_NODE_IP_CHECK| Skips checks for duplicate Node IPs. This can reduce the load on the cluster when a large number of Nodes are restarting. | string | false | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string -> boolean
|
||
// If we report an IP change (v4 or v6) we should verify there are no | ||
// conflicts between Nodes. | ||
if checkConflicts && os.Getenv("DISABLE_NODE_IP_CHECK") != "true" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we should be more forgiving about the value that people can pass in - I think anything that resembles a boolean true should be sufficient: true t y 1 (case insensitive)
@caseydavenport @heschlie WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we respect for other true/false variables? Seems fine/good to do, but I'm happy to merge without.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'm happy to merge without too. I'll raise an issue to track better handling of boolean envs.
@@ -56,7 +56,7 @@ def test_bgp_backends(self): | |||
workload_host3 = host3.create_workload("workload3", network=network1, ip=DEFAULT_IPV4_ADDR_3) | |||
|
|||
# Allow network to converge | |||
self.assert_true(workload_host1.check_can_ping(workload_host2.ip, retries=10)) | |||
self.assert_true(workload_host1.check_can_ping(workload_host2.ip, retries=15)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm.. I wonder why these changes are required.. have we regressed some sort of timing thing? Any ideas @robbrockbank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea - these should not need to be updated for this PR. Perhaps local environment is slower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to reduce the number of inconsistent test failures I was seeing, will revert as I don't think it made any difference.
I've Added in some UT's in a table driven test in the |
calico_node/startup/startup.go
Outdated
@@ -236,7 +239,7 @@ func getNode(client *client.Client, nodeName string) *api.Node { | |||
|
|||
// configureIPsAndSubnets updates the supplied node resource with IP and Subnet | |||
// information to use for BGP. | |||
func configureIPsAndSubnets(node *api.Node) { | |||
func configureIPsAndSubnets(node *api.Node) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final thing: I think the comment should indicate that it returns true if the node IP addresses are updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - one final comment, but I don't need to see it again.
Need to squash before merging.
I've updated the UT to do some IPv6 checks as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@heschlie could you please squash before merging? Thanks!
Oh, and could you tweak the release note to talk about "calico/node" instead of the startup script? That's more user-facing. |
This fix should result in less calls to `checkConflictingNodes()` which makes API calls to `client.Nodes.List()` which is a very expensive call. Added in env car check to `DISABLE_NODE_IP_CHECK`, will skip conflict checks if set to "true". Document changes for new env var Feedback update - Minor fixes from feedback - Reverted retry counts in STs - Created some UTs for new functionality in `configureIPsAndSubnets()` Testing IPv6 for changes Rewrote `makeNode()` to create IPv6 as well, added some test cases for IPv6
…-release-v3.21 [release-v3.21] Semaphore Auto Pin Update
Description
This fix should result in less calls to
checkConflictingNodes()
which makes API calls toclient.Nodes.List()
which is a very expensive call. This should result in less strain during large clusters with node churn.I'm not sure if this should have a release note tied to it.
Release Notes