Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing calls to checkConflictingNodes() #915

Merged
merged 1 commit into from
Jul 19, 2017

Conversation

heschlie
Copy link
Contributor

@heschlie heschlie commented Jul 14, 2017

Description

This fix should result in less calls to checkConflictingNodes() which makes API calls to client.Nodes.List() which is a very expensive call. This should result in less strain during large clusters with node churn.

I'm not sure if this should have a release note tied to it.

Release Notes

calico/node will now only check for conflicting Node IPs when initially getting an IP or when a change in IP is detected.  This should reduce the load on the cluster when a large number of nodes are restarting.

@heschlie heschlie added this to the Calico v2.4.0 milestone Jul 14, 2017
@heschlie heschlie self-assigned this Jul 14, 2017
// we need to update it and check for conflicts.
if node.Spec.BGP.IPv6Address == nil || !node.Spec.BGP.IPv6Address.IP.Equal(cidr.IP) {
node.Spec.BGP.IPv6Address = cidr
checkConflictingNodes(client, node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you potentially call this twice - once for ipv4 and once for ipv6.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah you're right, which kinda defeats the purpose, will refactor it to only call once

@@ -84,14 +84,11 @@ func main() {
node := getNode(client, nodeName)

// Configure and verify the node IP addresses and subnets.
configureIPsAndSubnets(node)
configureIPsAndSubnets(client, node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit - rather than pass the client here it might be nicer to make it so configureIPsAndSubnets returns whether or not the an IP on the node changed, and we can use it below.

node.Spec.BGP.IPv4Address = cidr
// We autodetected an IPv4 address, if we have no previous IP, or it is different than the previous IP,
// we need to update it and check for conflicts.
if node.Spec.BGP.IPv4Address == nil || !node.Spec.BGP.IPv4Address.IP.Equal(cidr.IP) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this only checks the IP part of the CIDR, not the netmask.

I think we might want to set node.Spec.BGP.IPv4Address = cidr always - it's possible that the IP stays the same but the subnet changes.

@caseydavenport
Copy link
Member

In the original issue, there was also a suggestion that we add a config option to disable the check entirely - @heschlie could you add that to this PR?

@caseydavenport caseydavenport added the release-note-required Change has user-facing impact (no matter how small) label Jul 14, 2017
Copy link
Member

@caseydavenport caseydavenport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments.

@caseydavenport
Copy link
Member

@heschlie we'll need a corresponding docs PR to document the new configuration option. Could you link to it here once you've created it?

@heschlie
Copy link
Contributor Author

Does it need a separate PR or could/should it just be in this PR?

@caseydavenport
Copy link
Member

caseydavenport commented Jul 14, 2017

@heschlie oops, forgot we're in the Calico repo now!

It can/should share this PR.

@@ -267,7 +274,11 @@ func configureIPsAndSubnets(node *api.Node) {
adm := os.Getenv("IP_AUTODETECTION_METHOD")
cidr := autoDetectCIDR(adm, 4)
if cidr != nil {
// We autodetected an IPv4 address so update the value in the node.
// We autodetected an IPv4 address, if we have no previous IP, or it is different than the previous IP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't notice before, but this check is not quite sufficient as it only covers the auto-detection case.

I think we also need to handle the case when the IP address has been explicitly specified in the environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that's true, I'll add similar checks there as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps easier to store the initial IPv4 and IPv6 addresses and do the comparison once at the end of the fn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is wrong now that we've moved the check outside of this branch - should probably just be reverted to what it was.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for a few other comments in this PR.

if node.Spec.BGP.IPv6Address == nil || !node.Spec.BGP.IPv6Address.IP.Equal(envIp6.IP) {
checkConflicts = true
}
node.Spec.BGP.IPv6Address = envIp6
}
validateIP(node.Spec.BGP.IPv6Address)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be nicer and less code to store off the original value at the top of the function, and compare it to the new value at the bottom?

e.g.

oldv4 := node.Spec.BGP.IPv4Address
oldv6 := ...

// Rest of function.
...

if node.Spec.BGP.IPv4Addres != oldv4 ||  node.Spec.BGP.IPv6Address != oldv6 {
  ...
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I think I suggested the same thing earlier :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, missed that :)

Just driving by.

@heschlie heschlie force-pushed the fix-871 branch 2 times, most recently from 6950940 to 468b7bd Compare July 18, 2017 22:08
@heschlie
Copy link
Contributor Author

@caseydavenport @robbrockbank I think this should be good to go now, docs for new env var are in and the configureIPsAndSubnets() call simplified.

@@ -274,7 +285,8 @@ func configureIPsAndSubnets(node *api.Node) {
}
} else {
if ipv4Env != "" {
node.Spec.BGP.IPv4Address = parseIPEnvironment("IP", ipv4Env, 4)
envIp := parseIPEnvironment("IP", ipv4Env, 4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revert this change - not required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for IPv6 below.

@caseydavenport
Copy link
Member

@heschlie I think we need a unit test for this.

if oldIpv4 == nil || !node.Spec.BGP.IPv4Address.IP.Equal(oldIpv4.IP) {
checkConflicts = true
}
if oldIpv6 == nil || !node.Spec.BGP.IPv6Address.IP.Equal(oldIpv6.IP) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an IPv4 deployment, oldIpv6 will always be nil every time we start, so we'll always perform the conflict check.

This should be:

if (oldIpv6 == nil && node.Spec.BGP.IPv6Address != nil) || (oldIpv6 != nil && !node.Spec.BGP.IPv6Address.IP.Equal(oldIpv6.IP))

or something like that

}
validateIP(node.Spec.BGP.IPv6Address)
}

// Detect if we've seen the IP address change, and flag that we need to check for conflicting Nodes
if oldIpv4 == nil || !node.Spec.BGP.IPv4Address.IP.Equal(oldIpv4.IP) {
checkConflicts = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do you really need to define checkConflicts? You could just return true here.

}
validateIP(node.Spec.BGP.IPv6Address)
}

// Detect if we've seen the IP address change, and flag that we need to check for conflicting Nodes
if oldIpv4 == nil || !node.Spec.BGP.IPv4Address.IP.Equal(oldIpv4.IP) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should log in these two branches.

@@ -16,6 +16,7 @@ The `calico/node` container is primarily configured through environment variable
| IP6 | The IPv6 address for Calico will bind to. When specified, the address is saved in the [node resource configuration]({{site.baseurl}}/{{page.version}}/reference/calicoctl/resources/node) for this host, overriding any previously configured value. When omitted, if an address has not yet been configured in the node resource, IPv6 routing is not enabled. When omitted, if an IPv6 address has been previously configured in the node resource, IPv6 is enabled using the already configured address. | IPv6 | |
| IP_AUTODETECTION_METHOD| The method to use to autodetect the IPv4 address for this host. This is only used when the IPv4 address is being autodetected. See [IP Autodetection methods](#ip-autodetection-methods) for details of the valid methods. | string | first-found |
| IP6_AUTODETECTION_METHOD| The method to use to autodetect the IPv6 address for this host. This is only used when the IPv6 address is being autodetected. See [IP Autodetection methods](#ip-autodetection-methods) for details of the valid methods. | string | first-found |
| DISABLE_NODE_IP_CHECK| Skips checks for duplicate Node IPs. This can reduce the load on the cluster when a large number of Nodes are restarting. | string | false |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string -> boolean


// If we report an IP change (v4 or v6) we should verify there are no
// conflicts between Nodes.
if checkConflicts && os.Getenv("DISABLE_NODE_IP_CHECK") != "true" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we should be more forgiving about the value that people can pass in - I think anything that resembles a boolean true should be sufficient: true t y 1 (case insensitive)

@caseydavenport @heschlie WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we respect for other true/false variables? Seems fine/good to do, but I'm happy to merge without.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm happy to merge without too. I'll raise an issue to track better handling of boolean envs.

@@ -56,7 +56,7 @@ def test_bgp_backends(self):
workload_host3 = host3.create_workload("workload3", network=network1, ip=DEFAULT_IPV4_ADDR_3)

# Allow network to converge
self.assert_true(workload_host1.check_can_ping(workload_host2.ip, retries=10))
self.assert_true(workload_host1.check_can_ping(workload_host2.ip, retries=15))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. I wonder why these changes are required.. have we regressed some sort of timing thing? Any ideas @robbrockbank?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea - these should not need to be updated for this PR. Perhaps local environment is slower?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to reduce the number of inconsistent test failures I was seeing, will revert as I don't think it made any difference.

@heschlie
Copy link
Contributor Author

I've Added in some UT's in a table driven test in the startup_test.go

@@ -236,7 +239,7 @@ func getNode(client *client.Client, nodeName string) *api.Node {

// configureIPsAndSubnets updates the supplied node resource with IP and Subnet
// information to use for BGP.
func configureIPsAndSubnets(node *api.Node) {
func configureIPsAndSubnets(node *api.Node) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final thing: I think the comment should indicate that it returns true if the node IP addresses are updated.

Copy link
Contributor

@robbrockbank robbrockbank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - one final comment, but I don't need to see it again.

Need to squash before merging.

@heschlie
Copy link
Contributor Author

I've updated the UT to do some IPv6 checks as well.

Copy link
Member

@caseydavenport caseydavenport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@heschlie could you please squash before merging? Thanks!

@caseydavenport
Copy link
Member

Oh, and could you tweak the release note to talk about "calico/node" instead of the startup script? That's more user-facing.

This fix should result in less calls to `checkConflictingNodes()` which
makes API calls to `client.Nodes.List()` which is a very expensive call.

Added in env car check to `DISABLE_NODE_IP_CHECK`, will skip conflict
checks if set to "true".

Document changes for new env var

Feedback update

- Minor fixes from feedback
- Reverted retry counts in STs
- Created some UTs for new functionality in `configureIPsAndSubnets()`

Testing IPv6 for changes

Rewrote `makeNode()` to create IPv6 as well, added some test cases for IPv6
@heschlie heschlie merged commit 765dc25 into projectcalico:master Jul 19, 2017
@caseydavenport caseydavenport mentioned this pull request Jul 31, 2017
3 tasks
caseydavenport pushed a commit that referenced this pull request Dec 14, 2021
…-release-v3.21

[release-v3.21] Semaphore Auto Pin Update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants