Doc for 'preDNAT' policy flag #910

nelljerram · 2017-07-13T16:05:22Z

Description

Draft documentation for the forthcoming 'preDNAT' policy flag.

Todo

Review, revise and merge.
Write release note.

Release Note

Pre-DNAT Policy - a new flavor of Calico Policy that is enforced before any DNAT that a cluster node may do (for example kube-proxy).  Pre-DNAT Policy is useful for securing the perimeter of a cluster against incoming traffic, except for pinholes that are expressed in terms of particular IP addresses and/or ports that external clients are allowed to connect in to.  For more information please see http://docs.projectcalico.org/v2.4/getting-started/bare-metal/bare-metal#pre-dnat-policy.

bcreane · 2017-07-14T17:18:30Z

master/getting-started/bare-metal/bare-metal.md

+
+## Pre-DNAT policy
+
+Policy for host endpoints can be marked as 'preDNAT'.  This means that rules in


Very well written section. One suggestion, consider expanding DNAT acronym once (if it's not already defined anywhere else) - I know not everyone is familiar with the term.

Thanks @bcreane, I'll look at doing that.

lxpollitt · 2017-07-15T11:16:26Z

master/reference/calicoctl/resources/policy.md

+| preDNAT    | Indicates to apply the rules in this policy before any DNAT.                                                                                          | true, false     | boolean               | false   |
+
+The `doNotTrack` and `preDNAT` fields are meaningful only when applying policy to a
+[host endpoint]({{site.baseurl}}/{{page.version}}/reference/calicoctl/resources/hostendpoint).


I didn't realize that doNotTrack was limited to host endpoints today. I guess (assuming that is true) then preDNAT would be ok to limit to host endpoints too, but it would be good to better understand the implementation cost of making these options supported for all endpoint types. I'm not advocating expanding the scope of this PR, but perhaps we should have a follow-on PR for consideration for Calico 2.5?

Yes, doNotTrack has so far been limited to host endpoints. Also I'm pretty sure that the current requirement for preDNAT only needs application to host endpoints.

I think it would be a doable but non-trivial addition to do preDNAT for workload endpoints too -
so happy for considering this as a possible feature for 2.5. (Think we need to

make sure that relevant workload endpoint chains are in the mangle table

program workload dispatch chains into the mangle table

add rules into Calico's mangle PREROUTING chain to dispatch to the workload chains)

Couple of thoughts:

There's a dataplane perf impact for every untracked rule that we add so I think there needs to be a clear requirement for workload untracked policy before we pay the per-packet cost for the dispatch chains.

It'd be hard to get good semantics for untracked policy for workloads. Since the raw PREROUTING chain is before the routing table and before NAT, we don't know whether the packets are workload packets at the point the rules are executed. Also, marking a packet as untracked takes it out of NAT so it breaks things like kube-proxy.

nelljerram · 2017-07-18T16:57:10Z

@fasaxc - Would appreciate your review of the new "When do host endpoint policies apply" section, in case you spot any mistakes.

@tmjd - Would appreciate your thoughts also, to see if you're happy that this meshes well with your #892

tmjd

One question and maybe a change. But other than that I think it looks good.

tmjd · 2017-07-18T20:17:12Z

master/getting-started/bare-metal/bare-metal.md

+  > overall result as if there was no untracked policy at all, so in practice we
+  > can describe this as untracked policies not applying to this flow.
+
+- Pre-DNAT policies _do_ apply.


Does this mean if I have a preDNAT policy that accepts all traffic for a NodePort, if I have workload also listening on that same port that the traffic will be allowed, even if I don't have a policy allowing it to that workload?
If so, we might want a note about that.

Hmm, that's an excellent point. Does the pre-DNAT policy do a "final" accept that skips workload policy? That could be very confusing!

Pre-DNAT policy doesn't know if it's going to a workload, because it's pre-routing.

Note that also that's a key part of its feature - that it works the same for kube-proxy forwarding to a local pod as to a remote pod.

(OK, I misunderstood the point here - let's talk.)

nelljerram · 2017-07-21T13:30:59Z

@tmjd @fasaxc @lxpollitt I've added a lot to this doc and would appreciate your review again. In particular:

I believe I've addressed Erik's concern above, by clarifying that pre-DNAT policy and workload policy both apply, for traffic that comes in a host endpoint and goes to a local workload
I've added a completed worked example for a typical use case, and hope that that now demonstrates (a) that pre-DNAT policy will be effective in achieving that use case, and (b) that the policy configuration required of the admin is as simple as it could reasonably be.

Please do let me know what you think.

fasaxc

Great to get something down about all the different packet flows and a worked example, thanks!

A few thoughts inline.

fasaxc · 2017-07-21T17:02:17Z

master/getting-started/bare-metal/bare-metal.md

+Traffic from outside is addressed to any node's IP address, on a known
+NodePort, and Kubernetes (kube-proxy) then DNATs that to the IP address of one
+of the pods that provides the corresponding Service, and the relevant port
+number on that port (which is usually different from the NodePort).


Typo: port number on that port

fasaxc · 2017-07-21T17:03:46Z

master/getting-started/bare-metal/bare-metal.md

+reflecting that it is designed for the policing of incoming traffic from
+outside the cluster:
+
+1. Only the ingress rules of a pre-DNAT policy are enforced.  If the policy has


Out of date: we now validate against egress policy being supplied.

fasaxc · 2017-07-21T17:07:22Z

master/getting-started/bare-metal/bare-metal.md

+   (Whereas normal host endpoint policy is skipped, for traffic going to a
+   local workload.)
+
+3. There is no 'default drop' semantic for pre-DNAT policy (as there is for


I think this only applies if the packet is heading to a pod/service. How about breaking this into cases for "packet terminated by host" (where the remaining normal policy then applies including default drop) and "packet forwarded on to workload or remote workload" (where the packet goes on its way)

As discussed, think this is actually OK as is.

fasaxc · 2017-07-21T17:12:31Z

master/getting-started/bare-metal/bare-metal.md

+For packets that arrive on a host interface and are destined for a local
+workload - i.e. a locally-hosted pod, container or VM:
+
+- Untracked policies technically do apply, but never have any net positive


Bit of a shame to start this section with a negative "technically yes, but no". Can you pull the mainline useful cases to the top and push this one to the bottom (maybe even in a "Not recommended" section)?

To stay consistent, I'll move all of the (effectively-)non-applying cases to the end of their sections. I thought it was nice to have the really-applying ones in the order that they happen, though.

fasaxc · 2017-07-21T17:14:04Z

master/getting-started/bare-metal/bare-metal.md

+  destination workload's ingress policy in this case.
+
+For packets that arrive on a host interface and are destined for a local
+server in the host namespace:


Would it be better to say server process, just to avoid the reading parsing it as server machine (as I did and had to backtrack)

fasaxc · 2017-07-21T17:25:19Z

master/getting-started/bare-metal/bare-metal.md

+
+## Pre-DNAT policy: a worked example
+
+Imagine a Kubernetes cluster, that its administrator wants to secure as much as


This is a great example but a diagram would be a great help here, it was at the limit of my mental "stack" to follow!

It might help to point out that you're talking about a VIP that routes to several hosts for the load balancer; I didn't grok that at first; left me wondering why several hosts could have the same IP in the dest.

tmjd

A few comments and one concern that I'd like you to double check.
I like the diagram but if it is easy to remove the 2nd row of Nodes I think that would make it a little cleaner and I don't think anything is lost with removing them.

tmjd · 2017-07-21T18:33:41Z

master/getting-started/bare-metal/bare-metal.md

+points about the application of pre-DNAT policy that make this work; especially
+as pre-DNAT policy differs on these points from normal host endpoint policy.
+
+Firstly, there is no 'default drop' semantic for pre-DNAT policy, like there


I'm glad you added this note about no 'default drop' for pre-DNAT. That consideration went through my head several times when I was first reading this doc so I'm glad you've explicitly stated it.

tmjd · 2017-07-21T18:52:59Z

master/getting-started/bare-metal/bare-metal.md

+workload policy, for packets going to a local workload.)
+
+For the example here, that means that the last pre-DNAT policy above does not
+accidentally expose workloads that happen to use the same `<node-port>`, or


I did not think this was the case. I thought if a packet is received (on the host interface) and there is pre-DNAT policy applied that accepts the packet then it is passed on to the container (no further rules would matter). But I'm not very sure of my understanding in a case like this so I could certainly be wrong. I would be interested in being enlightened on why it will still pass through the workload chains but that can wait till another time.

tl;dr If you're confident that even if a pre-DNAT policy would accept a packet destined for a workload that the workload policy will still be applied and honored then I'm good with this.

Yes, the behavior Neil describes is what we want (and is hopefully what he's implemented too).

@tmjd I think the way to think of this, in general, is that the pre-DNAT policy protects the cluster perimeter, and the workload policy protects the particular workload; and then I think it's obvious that there will be situations where both of those are wanted.

tmjd · 2017-07-21T19:04:14Z

master/getting-started/bare-metal/bare-metal.md

+  NodePorts - i.e. as well-known TCP port numbers that appear to be available
+  on any node in the cluster.
+
+- Most of those Services, however, should not be accessed via _any_ node, but


Before I read through the examples below I thought you were going to block nodes and pods from being able to access these NodePorts too (if the traffic was not destined for the LoadBalancer IP). I'd like to see it mentioned that the traffic being blocked is external traffic only.

Thanks; adding two more instances of "from outside the cluster", to clarify this.

caseydavenport · 2017-07-21T21:56:22Z

@neiljerram in case you forgot, this PR needs a real release note soon

nelljerram · 2017-07-21T23:26:25Z

@caseydavenport Release note filled in.

@tmjd Thanks for your review. I may be able to work on your comments during the weekend, but otherwise it will be Monday.

lxpollitt · 2017-07-23T18:19:55Z

master/getting-started/bare-metal/bare-metal.md

+   but has no pre-DNAT policies that explicitly allow or deny a particular
+   incoming packet, that packet is allowed to continue on its way, and will
+   then be accepted or dropped according to workload policy - if it is going to
+   a local workload - or to normal host endpoint policy - if not.


Would this punctuation be slightly easier to parse?
"then be accepted or dropped according to workload policy (if it is going to
a local workload) or to normal host endpoint policy (if not)."

Yes, thanks.

lxpollitt · 2017-07-23T18:33:31Z

master/getting-started/bare-metal/bare-metal.md

+  ingress:
+    - action: deny
+      source:
+        notNets: [<pod-cidr>, <cluster-internal-node-cidr>, ...]


This works / is correct, but I'm wondering if a different example using 2 policies to achieve this might be clearer instead:

First policy is a normal (not preDNAT) policy that you would setup on a node in a k8s cluster if you are using hostEndpoints. This is the "allow source ...". You would do this independent of whether you were trying to lock down nodePorts.

Second policy is a preDNAT policy that locks down the nodePorts "deny dest ".

WDYT?

I don't believe the "second policy ... that locks down the nodePorts" matches the use case that the rest of the narrative here describes, namely to lock down all external traffic and then open pinholes for particular node ports or load balancer ingress IPs.

I could perhaps adjust your suggestion for that, but given where we are w.r.t. the release, and that I'm not confident that I completely understand your suggestion, I'd rather leave this as is for now and use a new PR for further refinements.

lxpollitt · 2017-07-23T18:39:33Z

master/getting-started/bare-metal/bare-metal.md

+workload policy, for packets going to a local workload.)
+
+For the example here, that means that the last pre-DNAT policy above does not
+accidentally expose workloads that happen to use the same `<node-port>`, or


Yes, the behavior Neil describes is what we want (and is hopefully what he's implemented too).

caseydavenport · 2017-07-25T00:42:35Z

@neiljerram the v2.4 docs have been merged so we'll need to copy these changes over to that directory as well.

nelljerram · 2017-07-25T10:55:06Z

@tmjd Many thanks for your review comments. I've addressed them specifically above, except for the one about the diagram. I think you're right that it could be cleaner without the second row of Nodes, but unfortunately it's not easy for me to change this now - because I prepared it using Google Drawing and exported as PNG but unfortunately forgot to edit as some editable format like SVG! I still think this may be worth doing, but probably not now before 2.4 goes out.

nelljerram · 2017-07-25T10:55:35Z

@lxpollitt Thanks for your comments too, which I've addressed individually above.

…-master [master] Semaphore Auto Pin Update

nelljerram mentioned this pull request Jul 13, 2017

Calico host policy support for nat table #823

Closed

caseydavenport added this to the Calico v2.4.0 milestone Jul 13, 2017

caseydavenport added the release-note-required Change has user-facing impact (no matter how small) label Jul 13, 2017

bcreane reviewed Jul 14, 2017

View reviewed changes

caseydavenport assigned nelljerram and ozdanborne Jul 14, 2017

lxpollitt reviewed Jul 15, 2017

View reviewed changes

nelljerram force-pushed the pre-dnat-policy branch from 6b727b9 to 38fc908 Compare July 18, 2017 16:30

nelljerram requested review from fasaxc and tmjd July 18, 2017 16:54

tmjd reviewed Jul 18, 2017

View reviewed changes

fasaxc requested changes Jul 21, 2017

View reviewed changes

tmjd reviewed Jul 21, 2017

View reviewed changes

lxpollitt reviewed Jul 23, 2017

View reviewed changes

nelljerram force-pushed the pre-dnat-policy branch from 06ec710 to c152ee4 Compare July 24, 2017 20:18

Neil Jerram added 7 commits July 25, 2017 11:36

Doc for 'preDNAT' policy flag

208cfa0

Table syntax fix

4d1d812

Explain when host endpoint policies apply

9bc2cae

Explain 'DNAT' acronym

08d1ca9

Revise and add pre-DNAT worked example

05d41f7

A few self-review markups

857178c

Review markups

0402ef1

Doc review markups

d43ff4a

nelljerram force-pushed the pre-dnat-policy branch from c152ee4 to d43ff4a Compare July 25, 2017 10:56

Neil Jerram added 2 commits July 25, 2017 11:58

Copy pre-DNAT doc changes to v2.4 docs

d08a3b5

Cut specific OpenStack install URLs, some of which are now broken

d48abe7

nelljerram merged commit 434944d into projectcalico:master Jul 25, 2017

nelljerram deleted the pre-dnat-policy branch July 25, 2017 12:06

caseydavenport mentioned this pull request Jul 31, 2017

Release 2.4.0 #981

Merged

3 tasks

caseydavenport pushed a commit that referenced this pull request Dec 14, 2021

Merge pull request #910 from projectcalico/semaphore-auto-pin-updates…

192c545

…-master [master] Semaphore Auto Pin Update


		## Pre-DNAT policy

		Policy for host endpoints can be marked as 'preDNAT'. This means that rules in


		## Pre-DNAT policy: a worked example

		Imagine a Kubernetes cluster, that its administrator wants to secure as much as

Doc for 'preDNAT' policy flag #910

Doc for 'preDNAT' policy flag #910

Conversation

nelljerram commented Jul 13, 2017 • edited Loading

Description

Todo

Release Note

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nelljerram commented Jul 18, 2017

tmjd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nelljerram Jul 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nelljerram commented Jul 21, 2017

fasaxc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmjd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caseydavenport commented Jul 21, 2017

nelljerram commented Jul 21, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caseydavenport commented Jul 25, 2017

nelljerram commented Jul 25, 2017

nelljerram commented Jul 25, 2017

nelljerram commented Jul 13, 2017 •

edited

Loading

nelljerram Jul 19, 2017 •

edited

Loading