-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add per-node config infrastructure and per-node BGP peering #464
Add per-node config infrastructure and per-node BGP peering #464
Conversation
154c3db
to
d45d94b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okey doke, I've done a first pass and have some questions :)
// Get the names and the latest Node settings associated with the Key. | ||
_, resName, node, err := c.getNamesAndNodeFromKey(kvp.Key) | ||
if err != nil { | ||
logContext.WithError(err).Info("Error getting current settings when creating resource.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be an Error level log, right?
Plus "getting current settings" is a bit odd - isn't this more "Error looking up resource" or something like that?
}) | ||
logContext.Debug("Update per-Node resource") | ||
|
||
// Get the names and the latest Node settings associated with the Key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code (down until line 122) seems common with above - might it make sense to pull out into a helper?
logContext.WithError(err).Error("Error extracting annotation when updating resource") | ||
return nil, err | ||
} | ||
if len(kvps) != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible we'll ever get say, len 2 back? I think probably not..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, and it'd be an error if we did.
for _, node := range nodes { | ||
nodeKVPs, err := c.extractFromAnnotations(&node, resName) | ||
if err != nil { | ||
logContext.WithField("NodeName", node.GetName()).WithError(err).Warning("Error listing resources for Node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I wonder if we should fail the list request if this happens.
I think the contract should probably be that either it succeeds (and all the desired nodes are listed) or it fails and you get an error. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should always return what we can (otherwise stuffing up a single entry in a Node annotation could potentially take down all resources and render them useless.
My take would be, for all operations:
- If the node name is specified and the node annotation cannot be parsed at all due to corrupted data: return an error
- If the resource name is explicitly specified, and that specific entry within the annotation is corrupted: return an error
- If the resource name is not specified (which will only be the case for List) and some of the entries are corrupt (but not all of them) then return what entries within the annotation that we can. This would minimize the impact of someone having attempted to hand edit (badly) an annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all unexpected error cases - but I think it's nice to be able to handle as gracefully as we can these cases.
|
||
// getNamesAndNodeFromKey extracts the Node name and the Resource name from the Key | ||
// and gets the current Node resource config from the Kubernetes API. | ||
// Returns: the Node name, the Resource name, The Node resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to return the node Name and the Node itself?
The name is accessible from the returned Node.
// Get the current node settings. | ||
node, err := c.clientSet.Nodes().Get(nodeName, metav1.GetOptions{}) | ||
if err != nil { | ||
logContext.WithError(err).Info("Error getting Node configuration") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be an error log that says "Error getting Node from Kubernetes API" and include the node name.
logContext.Debug("Restrict results to only include requested resource name") | ||
val, ok := raw[reqName] | ||
raw = make(map[string]string, 0) | ||
if ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if reqName != ""
and !ok
, shouldn't that be an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is used for Listing too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. I see now.
var kvp *model.KVPair | ||
var err error | ||
for i := 0; i < maxActionRetries; i++ { | ||
if kvp, err = r.client.Update(object); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, how does this work?
If we get get a CAS error here, we'll need to re-read the object before writing, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup this wrapper is wrapped around the outside of the BGPPeer Client. So if that client filters up a "please retry this" error, this wrapper will re-invoke the Update on the BGPPeer Client (which will perform the Get/Modify/Update).
@@ -186,109 +189,3 @@ var _ = testutils.E2eDatastoreDescribe("BGPPeer tests", testutils.DatastoreEtcdV | |||
}) | |||
}) | |||
}) | |||
|
|||
// Perform CRUD operations on Global BGP Peer Resources | |||
var _ = testutils.E2eDatastoreDescribe("Global BGPPeer tests", testutils.DatastoreAll, func(config api.CalicoAPIConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did all of these guys go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guys ^^ should be sufficient. Previously I couldn't flag the guys ^^ to be tested against KDD because of the lack of Per-Node BGP support. Now that I can, I can get rid of these guys down below which were basically a copy of the guys above (but without any Per-node BGP Peers).
@caseydavenport : Feedback applied (plus a tad more). I tweaked the interface for the custom-node-resource helper ... so that it the responsibility of the derived resource to know how to marshal and unmarshal between the annotations and the resource. Main reason for doing this is that we may (and I think probably do) need to support per-node felix and bgp config. I suspect in this case we might just want annotation entries such as:
and
rather than having a single entry that stores a dict of all the config values. Hmmm, I wonder if we should do that for the peers now that i think about it:
rather than
|
4e7c6fb
to
bbbeb8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think this is the last round from me. A bunch of nitty comments.
lib/backend/k8s/k8s_fv_test.go
Outdated
@@ -742,6 +742,163 @@ var _ = Describe("Test Syncer API for Kubernetes backend", func() { | |||
}) | |||
}) | |||
|
|||
It("should handle a CRUD of Node BGP Peer", func() { | |||
var kvp1, kvp1_2, kvp2, kvp2_2 *model.KVPair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooo, underscores in variable names.
Haven't seen that in a while :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay okay - i've changed to 1a,1b and 2a,2b.
lib/backend/k8s/k8s_fv_test.go
Outdated
Expect(kvps[0].Key).To(Equal(kvp1_2.Key)) | ||
Expect(kvps[0].Value).To(Equal(kvp1_2.Value)) | ||
Expect(kvps[1].Key).To(Equal(kvp2_2.Key)) | ||
Expect(kvps[01].Value).To(Equal(kvp2_2.Value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kvps[01]?
lib/backend/k8s/k8s_fv_test.go
Outdated
Expect(kvps[0].Value).To(Equal(kvp1_2.Value)) | ||
}) | ||
|
||
By("Deleting an existing Node BGP Peer", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one more which is "Deleting a non-existent Node BGP Peer"?
// Update the revision information from the response. | ||
kvp.Revision = resOut.GetObjectMeta().GetResourceVersion() | ||
return kvp, nil | ||
// Return the Key and Value with updated Revision information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a preference or does it do something meaningful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we don't really need it although it makes the testing easier. Reasoning is, if you call
kvp_out, err := client.Create(kvp_in)
You wouldn't expect kvp_in to change, and therefore you might wish to use it again. But as it stands we'll update kvp_in and just return it as kvp_out.
Anyhow, I've removed as it wasn't needed for this PR.
key, err := converter.NodeAndNameToKey("nodeA", "1-2-3-4") | ||
Expect(err).To(BeNil()) | ||
Expect(key).To(Equal(model.NodeBGPPeerKey{ | ||
Nodename: "nodeS", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, shouldn't this be nodeA
? Why are the tests passing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that's just embarrassing - no test suite defined, so not actually running any of these tests. Will fix (and fix up broken tests).
}) | ||
logContext.Debug("Update per-Node resource") | ||
|
||
// Get the resource name and the latest Node settings associated with the Key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are "Node settings"? Does this mean "Node resource"?
} | ||
ak := c.nameToAnnotationKey(resName) | ||
|
||
// There should be no existing entry for a Create. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stale comment
nodeList, err := c.ClientSet.Nodes().List(metav1.ListOptions{}) | ||
if err != nil { | ||
logContext.WithError(err).Info("Failed to list resources: unable to list Nodes") | ||
err = K8sErrorToCalico(err, nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we combine this line into the return below?
} | ||
|
||
// getNameAndNodeFromKey extracts the resource name from the Key | ||
// and gets the current Node resource config from the Kubernetes API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Node resource config" I think is just a complicated way of saying "Node" :)
|
||
// getNameAndNodeFromKey extracts the resource name from the Key | ||
// and gets the current Node resource config from the Kubernetes API. | ||
// Returns: the Resource name, The Node resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird capitalization.
@caseydavenport hopefully last iteration |
@robbrockbank feel free to merge after squashing 🤞 |
e9afc76
to
29df6fd
Compare
addig L7LogsURLCharLimit config setting
Should be ready for initial review.
This PR adds generic per-node configuration (so it could if necessary be used for per-node Felix config and per-node BGP config). Longer term the per-node stuff should disappear as we take a more selector based approach to configuration - but having the generic code should make it easier if we need to add the additional per-node config as an interim measure.
This also adds per-node BGP Peering and includes e2e tests that test against etcdv2 and kdd.
Note that due to the node being an oft-updated resource, it is necessary to retry the update if there is a conflict. I added a generic retryWrapper to handle this. Any resource where we are tacking Calico data onto another Kubernetes resource could benefit from being wrapped.