Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add per-node config infrastructure and per-node BGP peering #464

Merged
merged 1 commit into from
Jul 6, 2017

Conversation

robbrockbank
Copy link
Contributor

@robbrockbank robbrockbank commented Jun 28, 2017

Should be ready for initial review.

This PR adds generic per-node configuration (so it could if necessary be used for per-node Felix config and per-node BGP config). Longer term the per-node stuff should disappear as we take a more selector based approach to configuration - but having the generic code should make it easier if we need to add the additional per-node config as an interim measure.

This also adds per-node BGP Peering and includes e2e tests that test against etcdv2 and kdd.

Note that due to the node being an oft-updated resource, it is necessary to retry the update if there is a conflict. I added a generic retryWrapper to handle this. Any resource where we are tacking Calico data onto another Kubernetes resource could benefit from being wrapped.

@robbrockbank robbrockbank force-pushed the per-node-config branch 5 times, most recently from 154c3db to d45d94b Compare June 28, 2017 07:09
@robbrockbank robbrockbank changed the title [WIP] Add per-node config Add per-node config infrastructure and per-node BGP peering Jun 29, 2017
Copy link
Member

@caseydavenport caseydavenport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey doke, I've done a first pass and have some questions :)

// Get the names and the latest Node settings associated with the Key.
_, resName, node, err := c.getNamesAndNodeFromKey(kvp.Key)
if err != nil {
logContext.WithError(err).Info("Error getting current settings when creating resource.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be an Error level log, right?

Plus "getting current settings" is a bit odd - isn't this more "Error looking up resource" or something like that?

})
logContext.Debug("Update per-Node resource")

// Get the names and the latest Node settings associated with the Key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code (down until line 122) seems common with above - might it make sense to pull out into a helper?

logContext.WithError(err).Error("Error extracting annotation when updating resource")
return nil, err
}
if len(kvps) != 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible we'll ever get say, len 2 back? I think probably not..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, and it'd be an error if we did.

for _, node := range nodes {
nodeKVPs, err := c.extractFromAnnotations(&node, resName)
if err != nil {
logContext.WithField("NodeName", node.GetName()).WithError(err).Warning("Error listing resources for Node")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wonder if we should fail the list request if this happens.

I think the contract should probably be that either it succeeds (and all the desired nodes are listed) or it fails and you get an error. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always return what we can (otherwise stuffing up a single entry in a Node annotation could potentially take down all resources and render them useless.

My take would be, for all operations:

  • If the node name is specified and the node annotation cannot be parsed at all due to corrupted data: return an error
  • If the resource name is explicitly specified, and that specific entry within the annotation is corrupted: return an error
  • If the resource name is not specified (which will only be the case for List) and some of the entries are corrupt (but not all of them) then return what entries within the annotation that we can. This would minimize the impact of someone having attempted to hand edit (badly) an annotation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all unexpected error cases - but I think it's nice to be able to handle as gracefully as we can these cases.


// getNamesAndNodeFromKey extracts the Node name and the Resource name from the Key
// and gets the current Node resource config from the Kubernetes API.
// Returns: the Node name, the Resource name, The Node resource.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to return the node Name and the Node itself?

The name is accessible from the returned Node.

// Get the current node settings.
node, err := c.clientSet.Nodes().Get(nodeName, metav1.GetOptions{})
if err != nil {
logContext.WithError(err).Info("Error getting Node configuration")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be an error log that says "Error getting Node from Kubernetes API" and include the node name.

logContext.Debug("Restrict results to only include requested resource name")
val, ok := raw[reqName]
raw = make(map[string]string, 0)
if ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if reqName != "" and !ok, shouldn't that be an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is used for Listing too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. I see now.

var kvp *model.KVPair
var err error
for i := 0; i < maxActionRetries; i++ {
if kvp, err = r.client.Update(object); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, how does this work?

If we get get a CAS error here, we'll need to re-read the object before writing, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup this wrapper is wrapped around the outside of the BGPPeer Client. So if that client filters up a "please retry this" error, this wrapper will re-invoke the Update on the BGPPeer Client (which will perform the Get/Modify/Update).

@@ -186,109 +189,3 @@ var _ = testutils.E2eDatastoreDescribe("BGPPeer tests", testutils.DatastoreEtcdV
})
})
})

// Perform CRUD operations on Global BGP Peer Resources
var _ = testutils.E2eDatastoreDescribe("Global BGPPeer tests", testutils.DatastoreAll, func(config api.CalicoAPIConfig) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did all of these guys go?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guys ^^ should be sufficient. Previously I couldn't flag the guys ^^ to be tested against KDD because of the lack of Per-Node BGP support. Now that I can, I can get rid of these guys down below which were basically a copy of the guys above (but without any Per-node BGP Peers).

@robbrockbank
Copy link
Contributor Author

robbrockbank commented Jun 30, 2017

@caseydavenport : Feedback applied (plus a tad more).

I tweaked the interface for the custom-node-resource helper ... so that it the responsibility of the derived resource to know how to marshal and unmarshal between the annotations and the resource.

Main reason for doing this is that we may (and I think probably do) need to support per-node felix and bgp config. I suspect in this case we might just want annotation entries such as:

felix.projectcalico.org/<config>=<value>

and

bgp.projectcalico.org/<config>=<value>

rather than having a single entry that stores a dict of all the config values.

Hmmm, I wonder if we should do that for the peers now that i think about it:

bgppeer.projectcalico.org/peer1=<marshalled value1>
bgppeer.projectcalico.org/peer2=<marshalled value2>
bgppeer.projectcalico.org/peer3=<marshalled value3>

rather than

projectcalico.org/bgppeers=<marshalled dictionary of peers>

Copy link
Member

@caseydavenport caseydavenport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think this is the last round from me. A bunch of nitty comments.

@@ -742,6 +742,163 @@ var _ = Describe("Test Syncer API for Kubernetes backend", func() {
})
})

It("should handle a CRUD of Node BGP Peer", func() {
var kvp1, kvp1_2, kvp2, kvp2_2 *model.KVPair
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooo, underscores in variable names.

Haven't seen that in a while :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay okay - i've changed to 1a,1b and 2a,2b.

Expect(kvps[0].Key).To(Equal(kvp1_2.Key))
Expect(kvps[0].Value).To(Equal(kvp1_2.Value))
Expect(kvps[1].Key).To(Equal(kvp2_2.Key))
Expect(kvps[01].Value).To(Equal(kvp2_2.Value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kvps[01]?

Expect(kvps[0].Value).To(Equal(kvp1_2.Value))
})

By("Deleting an existing Node BGP Peer", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one more which is "Deleting a non-existent Node BGP Peer"?

// Update the revision information from the response.
kvp.Revision = resOut.GetObjectMeta().GetResourceVersion()
return kvp, nil
// Return the Key and Value with updated Revision information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a preference or does it do something meaningful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we don't really need it although it makes the testing easier. Reasoning is, if you call

kvp_out, err := client.Create(kvp_in)

You wouldn't expect kvp_in to change, and therefore you might wish to use it again. But as it stands we'll update kvp_in and just return it as kvp_out.

Anyhow, I've removed as it wasn't needed for this PR.

key, err := converter.NodeAndNameToKey("nodeA", "1-2-3-4")
Expect(err).To(BeNil())
Expect(key).To(Equal(model.NodeBGPPeerKey{
Nodename: "nodeS",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, shouldn't this be nodeA? Why are the tests passing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's just embarrassing - no test suite defined, so not actually running any of these tests. Will fix (and fix up broken tests).

})
logContext.Debug("Update per-Node resource")

// Get the resource name and the latest Node settings associated with the Key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are "Node settings"? Does this mean "Node resource"?

}
ak := c.nameToAnnotationKey(resName)

// There should be no existing entry for a Create.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale comment

nodeList, err := c.ClientSet.Nodes().List(metav1.ListOptions{})
if err != nil {
logContext.WithError(err).Info("Failed to list resources: unable to list Nodes")
err = K8sErrorToCalico(err, nodeName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we combine this line into the return below?

}

// getNameAndNodeFromKey extracts the resource name from the Key
// and gets the current Node resource config from the Kubernetes API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Node resource config" I think is just a complicated way of saying "Node" :)


// getNameAndNodeFromKey extracts the resource name from the Key
// and gets the current Node resource config from the Kubernetes API.
// Returns: the Resource name, The Node resource.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird capitalization.

@robbrockbank
Copy link
Contributor Author

@caseydavenport hopefully last iteration

@caseydavenport
Copy link
Member

@robbrockbank feel free to merge after squashing 🤞

@robbrockbank robbrockbank merged commit 2990e44 into projectcalico:master Jul 6, 2017
@robbrockbank robbrockbank deleted the per-node-config branch July 6, 2017 17:59
song-jiang pushed a commit to song-jiang/libcalico-go that referenced this pull request Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants