Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc for 'preDNAT' policy flag #910

Merged
merged 10 commits into from
Jul 25, 2017

Conversation

nelljerram
Copy link
Member

@nelljerram nelljerram commented Jul 13, 2017

Description

Draft documentation for the forthcoming 'preDNAT' policy flag.

Todo

  • Review, revise and merge.
  • Write release note.

Release Note

Pre-DNAT Policy - a new flavor of Calico Policy that is enforced before any DNAT that a cluster node may do (for example kube-proxy).  Pre-DNAT Policy is useful for securing the perimeter of a cluster against incoming traffic, except for pinholes that are expressed in terms of particular IP addresses and/or ports that external clients are allowed to connect in to.  For more information please see http://docs.projectcalico.org/v2.4/getting-started/bare-metal/bare-metal#pre-dnat-policy.

@caseydavenport caseydavenport added this to the Calico v2.4.0 milestone Jul 13, 2017
@caseydavenport caseydavenport added the release-note-required Change has user-facing impact (no matter how small) label Jul 13, 2017

## Pre-DNAT policy

Policy for host endpoints can be marked as 'preDNAT'. This means that rules in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well written section. One suggestion, consider expanding DNAT acronym once (if it's not already defined anywhere else) - I know not everyone is familiar with the term.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bcreane, I'll look at doing that.

| preDNAT | Indicates to apply the rules in this policy before any DNAT. | true, false | boolean | false |

The `doNotTrack` and `preDNAT` fields are meaningful only when applying policy to a
[host endpoint]({{site.baseurl}}/{{page.version}}/reference/calicoctl/resources/hostendpoint).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that doNotTrack was limited to host endpoints today. I guess (assuming that is true) then preDNAT would be ok to limit to host endpoints too, but it would be good to better understand the implementation cost of making these options supported for all endpoint types. I'm not advocating expanding the scope of this PR, but perhaps we should have a follow-on PR for consideration for Calico 2.5?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, doNotTrack has so far been limited to host endpoints. Also I'm pretty sure that the current requirement for preDNAT only needs application to host endpoints.

I think it would be a doable but non-trivial addition to do preDNAT for workload endpoints too -
so happy for considering this as a possible feature for 2.5. (Think we need to

  • make sure that relevant workload endpoint chains are in the mangle table
  • program workload dispatch chains into the mangle table
  • add rules into Calico's mangle PREROUTING chain to dispatch to the workload chains)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of thoughts:

  • There's a dataplane perf impact for every untracked rule that we add so I think there needs to be a clear requirement for workload untracked policy before we pay the per-packet cost for the dispatch chains.
  • It'd be hard to get good semantics for untracked policy for workloads. Since the raw PREROUTING chain is before the routing table and before NAT, we don't know whether the packets are workload packets at the point the rules are executed. Also, marking a packet as untracked takes it out of NAT so it breaks things like kube-proxy.

@nelljerram nelljerram requested review from fasaxc and tmjd July 18, 2017 16:54
@nelljerram
Copy link
Member Author

@fasaxc - Would appreciate your review of the new "When do host endpoint policies apply" section, in case you spot any mistakes.

@tmjd - Would appreciate your thoughts also, to see if you're happy that this meshes well with your #892

Copy link
Member

@tmjd tmjd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question and maybe a change. But other than that I think it looks good.

> overall result as if there was no untracked policy at all, so in practice we
> can describe this as untracked policies not applying to this flow.

- Pre-DNAT policies _do_ apply.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean if I have a preDNAT policy that accepts all traffic for a NodePort, if I have workload also listening on that same port that the traffic will be allowed, even if I don't have a policy allowing it to that workload?
If so, we might want a note about that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's an excellent point. Does the pre-DNAT policy do a "final" accept that skips workload policy? That could be very confusing!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-DNAT policy doesn't know if it's going to a workload, because it's pre-routing.

Copy link
Member Author

@nelljerram nelljerram Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that also that's a key part of its feature - that it works the same for kube-proxy forwarding to a local pod as to a remote pod.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(OK, I misunderstood the point here - let's talk.)

@nelljerram
Copy link
Member Author

@tmjd @fasaxc @lxpollitt I've added a lot to this doc and would appreciate your review again. In particular:

  • I believe I've addressed Erik's concern above, by clarifying that pre-DNAT policy and workload policy both apply, for traffic that comes in a host endpoint and goes to a local workload
  • I've added a completed worked example for a typical use case, and hope that that now demonstrates (a) that pre-DNAT policy will be effective in achieving that use case, and (b) that the policy configuration required of the admin is as simple as it could reasonably be.

Please do let me know what you think.

Copy link
Member

@fasaxc fasaxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to get something down about all the different packet flows and a worked example, thanks!

A few thoughts inline.

Traffic from outside is addressed to any node's IP address, on a known
NodePort, and Kubernetes (kube-proxy) then DNATs that to the IP address of one
of the pods that provides the corresponding Service, and the relevant port
number on that port (which is usually different from the NodePort).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: port number on that port

reflecting that it is designed for the policing of incoming traffic from
outside the cluster:

1. Only the ingress rules of a pre-DNAT policy are enforced. If the policy has
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of date: we now validate against egress policy being supplied.

(Whereas normal host endpoint policy is skipped, for traffic going to a
local workload.)

3. There is no 'default drop' semantic for pre-DNAT policy (as there is for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only applies if the packet is heading to a pod/service. How about breaking this into cases for "packet terminated by host" (where the remaining normal policy then applies including default drop) and "packet forwarded on to workload or remote workload" (where the packet goes on its way)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, think this is actually OK as is.

For packets that arrive on a host interface and are destined for a local
workload - i.e. a locally-hosted pod, container or VM:

- Untracked policies technically do apply, but never have any net positive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a shame to start this section with a negative "technically yes, but no". Can you pull the mainline useful cases to the top and push this one to the bottom (maybe even in a "Not recommended" section)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To stay consistent, I'll move all of the (effectively-)non-applying cases to the end of their sections. I thought it was nice to have the really-applying ones in the order that they happen, though.

destination workload's ingress policy in this case.

For packets that arrive on a host interface and are destined for a local
server in the host namespace:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to say server process, just to avoid the reading parsing it as server machine (as I did and had to backtrack)


## Pre-DNAT policy: a worked example

Imagine a Kubernetes cluster, that its administrator wants to secure as much as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great example but a diagram would be a great help here, it was at the limit of my mental "stack" to follow!

It might help to point out that you're talking about a VIP that routes to several hosts for the load balancer; I didn't grok that at first; left me wondering why several hosts could have the same IP in the dest.

Copy link
Member

@tmjd tmjd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments and one concern that I'd like you to double check.
I like the diagram but if it is easy to remove the 2nd row of Nodes I think that would make it a little cleaner and I don't think anything is lost with removing them.

points about the application of pre-DNAT policy that make this work; especially
as pre-DNAT policy differs on these points from normal host endpoint policy.

Firstly, there is no 'default drop' semantic for pre-DNAT policy, like there
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you added this note about no 'default drop' for pre-DNAT. That consideration went through my head several times when I was first reading this doc so I'm glad you've explicitly stated it.

workload policy, for packets going to a local workload.)

For the example here, that means that the last pre-DNAT policy above does not
accidentally expose workloads that happen to use the same `<node-port>`, or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not think this was the case. I thought if a packet is received (on the host interface) and there is pre-DNAT policy applied that accepts the packet then it is passed on to the container (no further rules would matter). But I'm not very sure of my understanding in a case like this so I could certainly be wrong. I would be interested in being enlightened on why it will still pass through the workload chains but that can wait till another time.

tl;dr If you're confident that even if a pre-DNAT policy would accept a packet destined for a workload that the workload policy will still be applied and honored then I'm good with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the behavior Neil describes is what we want (and is hopefully what he's implemented too).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmjd I think the way to think of this, in general, is that the pre-DNAT policy protects the cluster perimeter, and the workload policy protects the particular workload; and then I think it's obvious that there will be situations where both of those are wanted.

NodePorts - i.e. as well-known TCP port numbers that appear to be available
on any node in the cluster.

- Most of those Services, however, should not be accessed via _any_ node, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I read through the examples below I thought you were going to block nodes and pods from being able to access these NodePorts too (if the traffic was not destined for the LoadBalancer IP). I'd like to see it mentioned that the traffic being blocked is external traffic only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; adding two more instances of "from outside the cluster", to clarify this.

@caseydavenport
Copy link
Member

@neiljerram in case you forgot, this PR needs a real release note soon

@nelljerram
Copy link
Member Author

@caseydavenport Release note filled in.

@tmjd Thanks for your review. I may be able to work on your comments during the weekend, but otherwise it will be Monday.

but has no pre-DNAT policies that explicitly allow or deny a particular
incoming packet, that packet is allowed to continue on its way, and will
then be accepted or dropped according to workload policy - if it is going to
a local workload - or to normal host endpoint policy - if not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this punctuation be slightly easier to parse?
"then be accepted or dropped according to workload policy (if it is going to
a local workload) or to normal host endpoint policy (if not)."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks.

ingress:
- action: deny
source:
notNets: [<pod-cidr>, <cluster-internal-node-cidr>, ...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works / is correct, but I'm wondering if a different example using 2 policies to achieve this might be clearer instead:

  • First policy is a normal (not preDNAT) policy that you would setup on a node in a k8s cluster if you are using hostEndpoints. This is the "allow source ...". You would do this independent of whether you were trying to lock down nodePorts.
  • Second policy is a preDNAT policy that locks down the nodePorts "deny dest ".

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I don't believe the "second policy ... that locks down the nodePorts" matches the use case that the rest of the narrative here describes, namely to lock down all external traffic and then open pinholes for particular node ports or load balancer ingress IPs.
  • I could perhaps adjust your suggestion for that, but given where we are w.r.t. the release, and that I'm not confident that I completely understand your suggestion, I'd rather leave this as is for now and use a new PR for further refinements.

workload policy, for packets going to a local workload.)

For the example here, that means that the last pre-DNAT policy above does not
accidentally expose workloads that happen to use the same `<node-port>`, or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the behavior Neil describes is what we want (and is hopefully what he's implemented too).

@caseydavenport
Copy link
Member

@neiljerram the v2.4 docs have been merged so we'll need to copy these changes over to that directory as well.

@nelljerram
Copy link
Member Author

@tmjd Many thanks for your review comments. I've addressed them specifically above, except for the one about the diagram. I think you're right that it could be cleaner without the second row of Nodes, but unfortunately it's not easy for me to change this now - because I prepared it using Google Drawing and exported as PNG but unfortunately forgot to edit as some editable format like SVG! I still think this may be worth doing, but probably not now before 2.4 goes out.

@nelljerram
Copy link
Member Author

@lxpollitt Thanks for your comments too, which I've addressed individually above.

@nelljerram nelljerram merged commit 434944d into projectcalico:master Jul 25, 2017
@nelljerram nelljerram deleted the pre-dnat-policy branch July 25, 2017 12:06
@caseydavenport caseydavenport mentioned this pull request Jul 31, 2017
3 tasks
caseydavenport pushed a commit that referenced this pull request Dec 14, 2021
…-master

[master] Semaphore Auto Pin Update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants