Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc for 'preDNAT' policy flag #910

Merged
merged 10 commits into from
Jul 25, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added images/bare-metal-example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
307 changes: 296 additions & 11 deletions master/getting-started/bare-metal/bare-metal.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,12 @@ However, for host endpoints, Calico is more lenient; it only polices
traffic to/from interfaces that it's been explicitly told about. Traffic
to/from other interfaces is left alone.

As of Calico v2.1.0, Calico applies host endpoint security policy both to traffic
that is terminated locally, and to traffic that is forwarded between host
endpoints. Previously, policy was only applied to traffic that was terminated
As of Calico v2.1.0, Calico applies host endpoint security policy both to traffic
that is terminated locally, and to traffic that is forwarded between host
endpoints. Previously, policy was only applied to traffic that was terminated
locally. The change allows Calico to be used to secure a NAT gateway or router.
Calico supports selector-based policy as normal when running on a gateway or router
allowing for rich, dynamic security policy based on the labels attached to your
allowing for rich, dynamic security policy based on the labels attached to your
workloads.

> **NOTE**
Expand Down Expand Up @@ -414,13 +414,16 @@ Policy for host endpoints can be marked as 'doNotTrack'. This means that rules
in that policy should be applied before any data plane connection tracking, and
that packets allowed by these rules should not be tracked.

A typical scenario for using 'doNotTrack' policy would be a server, running
directly on a host, that accepts a very high rate of shortlived connections,
such as `memcached`. On Linux, if those connections are tracked, the conntrack
table can fill up and then Linux may drop packets for further connection
attempts, meaning that those newer connections will fail. If you are using
Calico to secure that server's host, you can avoid this problem by defining a
policy that allows access to the server's ports and is marked as 'doNotTrack'.
Untracked policy is designed for allowing untracked connections to a server
process running directly on a host - where by 'directly' we mean _not_ in a
pod/VM/container workload. A typical scenario for using 'doNotTrack' policy
would be a server, running directly on a host, that accepts a very high rate of
shortlived connections, such as `memcached`. On Linux, if those connections
are tracked, the conntrack table can fill up and then Linux may drop packets
for further connection attempts, meaning that those newer connections will
fail. If you are using Calico to secure that server's host, you can avoid this
problem by defining a policy that allows access to the server's ports and is
marked as 'doNotTrack'.

Since there is no connection tracking for a 'doNotTrack' policy, it is
important that the policy's ingress and egress rules are specified
Expand All @@ -429,3 +432,285 @@ an ingress rule allowing access *to* port 999 and an egress rule allowing
outbound traffic *from* port 999. (Whereas for a connection tracked policy, it
is usually enough to specify the ingress rule only, and then connection
tracking will automatically allow the return path.)

Because of how untracked policy is implemented, untracked ingress rules apply
to all incoming traffic through a host endpoint - regardless of where that
traffic is going - but untracked egress rules only apply to traffic that is
sent from the host itself (not from a local workload) out of that host
endpoint.

## Pre-DNAT policy

Policy for host endpoints can be marked as 'preDNAT'. This means that rules in
that policy should be applied before any DNAT (Destination Network Address
Translation), which is useful if it is more convenient to specify Calico policy
in terms of a packet's original destination IP address and port, than in terms
of that packet's destination IP address and port after it has been DNAT'd.

An example is securing access to Kubernetes NodePorts from outside the cluster.
Traffic from outside is addressed to any node's IP address, on a known
NodePort, and Kubernetes (kube-proxy) then DNATs that to the IP address of one
of the pods that provides the corresponding Service, and the relevant port
number on that pod (which is usually different from the NodePort).

As NodePorts are the externally advertised way of connecting to Services (and a
NodePort uniquely identifies a Service, whereas an internal port number may
not), it makes sense to express Calico policy to expose or secure particular
Services in terms of the corresponding NodePorts. But that is only possible if
the Calico policy is applied before DNAT changes the NodePort to something
else - and hence this kind of policy needs 'preDNAT' set to true.

In addition to being applied before any DNAT, the enforcement of pre-DNAT
policy differs from that of normal host endpoint policy in three key details,
reflecting that it is designed for the policing of incoming traffic from
outside the cluster:

1. Pre-DNAT policy may only have ingress rules, not egress. (When incoming
traffic is allowed by the ingress rules, standard connection tracking is
sufficient to allow the return path traffic.)

2. Pre-DNAT policy is enforced for all traffic arriving through a host
endpoint, regardless of where that traffic is going, and - in particular -
even if that traffic is routed to a local workload on the same host.
(Whereas normal host endpoint policy is skipped, for traffic going to a
local workload.)

3. There is no 'default drop' semantic for pre-DNAT policy (as there is for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only applies if the packet is heading to a pod/service. How about breaking this into cases for "packet terminated by host" (where the remaining normal policy then applies including default drop) and "packet forwarded on to workload or remote workload" (where the packet goes on its way)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, think this is actually OK as is.

normal host endpoint policy). In other words, if a host endpoint is defined
but has no pre-DNAT policies that explicitly allow or deny a particular
incoming packet, that packet is allowed to continue on its way, and will
then be accepted or dropped according to workload policy (if it is going to
a local workload) or to normal host endpoint policy (if not).

## When do host endpoint policies apply?

As stated above, normal host endpoint policies apply to traffic that arrives on
and/or is sent to a host interface, except if that traffic comes from or is
destined for a workload on the same host; but the rules for applying untracked
and pre-DNAT policies are different in some cases. Here we present and
summarize all of those rules together, for all possible flows and all types of
host endpoints policy.

For packets that arrive on a host interface and are destined for a local
workload - i.e. a locally-hosted pod, container or VM:

- Pre-DNAT policies apply.

- Normal policies do not apply - by design, because Calico enforces the
destination workload's ingress policy in this case.

- Untracked policies technically do apply, but never have any net positive
effect for such flows.

> **NOTE**
>
> To be precise, untracked policy for the incoming host interface may apply
> in the forwards direction, and if so it will have the effect of forwarding
> the packet to the workload without any connection tracking. But then, in
> the reverse direction, there will be no conntrack state for the return
> packets to match, and there is no application of any egress rules that may
> be defined by the untracked policy - so unless the workload's policy
> specifically allows the relevant source IP, the return packet will be
> dropped. That is the same overall result as if there was no untracked
> policy at all, so in practice it is as if untracked policies do not apply
> to this flow.

For packets that arrive on a host interface and are destined for a local
server process in the host namespace:

- Untracked, pre-DNAT and normal policies all apply.

- If a packet is explicitly allowed by untracked policy, it skips over any
pre-DNAT and normal policy.

- If a packet is explicitly allowed by pre-DNAT policy, it skips over any
normal policy.

For packets that arrive on a host interface (A) and are forwarded out of the
same or another host interface (B):

- Untracked policies apply, for both host interfaces A and B, but only the
ingress rules that are defined in those policies. The forwards direction is
governed by the ingress rules of untracked policies that apply to interface
A, and the reverse direction is governed by the ingress rules of untracked
policies that apply to interface B, so those rules should be defined
symmetrically.

- Pre-DNAT policies apply, specifically the ingress rules of the pre-DNAT
policies that apply to interface A. (The reverse direction is allowed by
conntrack state.)

- Normal policies apply, specifically the ingress rules of the normal policies
that apply to interface A, and the egress rules of the normal policies that
apply to interface B. (The reverse direction is allowed by conntrack state.)

- If a packet is explicitly allowed by untracked policy, it skips over any
pre-DNAT and normal policy.

- If a packet is explicitly allowed by pre-DNAT policy, it skips over any
normal policy.

For packets that are sent from a local server process (in the host namespace)
out of a host interface:

- Untracked policies apply, specifically the egress rules of the untracked
policies that apply to the host interface.

- Normal policies apply, specifically the egress rules of the normal policies
that apply to that host interface.

- Pre-DNAT policies do not apply.

For packets that are sent from a local workload out of a host interface:

- No host endpoint policies apply.

## Pre-DNAT policy: a worked example

Imagine a Kubernetes cluster, that its administrator wants to secure as much as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great example but a diagram would be a great help here, it was at the limit of my mental "stack" to follow!

It might help to point out that you're talking about a VIP that routes to several hosts for the load balancer; I didn't grok that at first; left me wondering why several hosts could have the same IP in the dest.

possible against incoming traffic from outside the cluster. Let's suppose that:

- The cluster provides various useful Services that are exposed as Kubernetes
NodePorts - i.e. as well-known TCP port numbers that appear to be available
on any node in the cluster.

- Most of those Services, however, should not be accessed from outside the
cluster via _any_ node, but instead via a LoadBalancer IP that is routable
from outside the cluster and maps to one of just a few 'ingress' nodes. (The
LoadBalancer IP is a virtual IP that, at any given time, gets routed somehow
to one of those 'ingress' nodes.)

- For a few Services, on the other hand, there is no LoadBalancer IP set up, so
those Services should be accessible from outside the cluster through their
NodePorts on any node.

- All other incoming traffic from outside the cluster should be disallowed.

![]({{site.baseurl}}/images/bare-metal-example.png)

For each Service in the first set, we want to allow traffic from outside the
cluster that is addressed to `<service-load-balancer-ip>:<service-port>`, but
only when it enters the cluster through one of the 'ingress' nodes. For each
Service in the second set, we want to allow traffic from outside the cluster
that is addressed to `<node-ip>:<service-node-port>`, via any node.

We can do this by applying Calico pre-DNAT policy to the external interfaces of
each cluster node. We use pre-DNAT policy, rather than normal host endpoint
policy, for two reasons:

1. Normal host endpoint policy is not enforced for incoming traffic to a local
pod, whereas pre-DNAT policy is enforced for _all_ incoming traffic. Here
we want to police all incoming traffic from outside the cluster, regardless
of its destination, so pre-DNAT is the right choice.

2. We want to express our policy in terms of the external port numbers
`<service-port>` and `<service-node-port>`. The kube-proxy on the ingress
node will use DNATs to change those port numbers (and IP addresses) to those
of one of the pods that backs the relevant Service. Our policy therefore
needs to be enforced _before_ those DNATs, and of course that is exactly
what pre-DNAT policy is for.

Let's begin with the policy to disallow incoming traffic by default. Every
outward interface of each node, by which traffic from outside could possibly
enter the cluster, must be defined as a Calico host endpoint; for example, for
`eth0` on `node1`:

```
apiVersion: v1
kind: hostEndpoint
metadata:
name: node1-eth0
node: node1
labels:
host-endpoint: ingress
spec:
interfaceName: eth0
```

The nodes that are allowed as load balancer ingress nodes should have an
additional label to indicate that, let's say `load-balancer-ingress: true`.

Then we can deny all incoming traffic through those interfaces, unless it is
from a source IP that is known to be within the cluster. (Note: we are
assuming that the same interfaces can also be used for traffic that is
forwarded from other nodes or pods in the cluster - as would be the case for
nodes with only one external interface.)

```
apiVersion: v1
kind: policy
metadata:
name: disallow-incoming
spec:
preDNAT: true
order: 100
ingress:
- action: deny
source:
notNets: [<pod-cidr>, <cluster-internal-node-cidr>, ...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works / is correct, but I'm wondering if a different example using 2 policies to achieve this might be clearer instead:

  • First policy is a normal (not preDNAT) policy that you would setup on a node in a k8s cluster if you are using hostEndpoints. This is the "allow source ...". You would do this independent of whether you were trying to lock down nodePorts.
  • Second policy is a preDNAT policy that locks down the nodePorts "deny dest ".

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I don't believe the "second policy ... that locks down the nodePorts" matches the use case that the rest of the narrative here describes, namely to lock down all external traffic and then open pinholes for particular node ports or load balancer ingress IPs.
  • I could perhaps adjust your suggestion for that, but given where we are w.r.t. the release, and that I'm not confident that I completely understand your suggestion, I'd rather leave this as is for now and use a new PR for further refinements.

selector: host-endpoint=='ingress'
```

Now, to allow traffic through the load balancer ingress nodes to
`<service-load-balancer-ip>:<service-port>` (for each load-balanced Service):

```
apiVersion: v1
kind: policy
metadata:
name: allow-load-balancer-service-1
spec:
preDNAT: true
order: 90
ingress:
- action: allow
destination:
nets: [<service-load-balancer-ip>]
ports: [<service-port>]
selector: load-balancer-ingress=='true'
```

And for traffic to NodePorts - for each non-load-balanced Service - via any
node:

```
apiVersion: v1
kind: policy
metadata:
name: allow-node-port-service-1
spec:
preDNAT: true
order: 90
ingress:
- action: allow
destination:
ports: [<node-port>]
selector: host-endpoint=='ingress'
```

And that completes the example. It's worth re-emphasizing, though, two key
points about the application of pre-DNAT policy that make this work; especially
as pre-DNAT policy differs on these points from normal host endpoint policy.

Firstly, there is no 'default drop' semantic for pre-DNAT policy, like there
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you added this note about no 'default drop' for pre-DNAT. That consideration went through my head several times when I was first reading this doc so I'm glad you've explicitly stated it.

_is_ for normal policy. So, if policies are defined such that _some_ pre-DNAT
policies apply to a host endpoint, but none of those policies matches a
particular incoming packet, that packet is allowed to continue on its way.
(Whereas if there are normal policies that apply to a host endpoint, and
none of those policies matches a packet, that packet will be dropped.)

For the example here, that means that we can specify some pre-DNAT policy,
applying to all of the cluster's external interfaces, without having to
enumerate and explicitly _allow_ all of the internal flows that may also go
through those interfaces. It's also why the second point works...

Namely, that if traffic comes in through a host endpoint and is routed to a
local workload, any host endpoint pre-DNAT policy is enforced as well as the
ingress policy for that workload - whereas normal host endpoint policy is
skipped in that scenario. (Normal host endpoint policy is 'trumped' by
workload policy, for packets going to a local workload.)

For the example here, that means that the last pre-DNAT policy above does not
accidentally expose workloads that happen to use the same `<node-port>`, or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not think this was the case. I thought if a packet is received (on the host interface) and there is pre-DNAT policy applied that accepts the packet then it is passed on to the container (no further rules would matter). But I'm not very sure of my understanding in a case like this so I could certainly be wrong. I would be interested in being enlightened on why it will still pass through the workload chains but that can wait till another time.

tl;dr If you're confident that even if a pre-DNAT policy would accept a packet destined for a workload that the workload policy will still be applied and honored then I'm good with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the behavior Neil describes is what we want (and is hopefully what he's implemented too).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmjd I think the way to think of this, in general, is that the pre-DNAT policy protects the cluster perimeter, and the workload policy protects the particular workload; and then I think it's obvious that there will be situations where both of those are wanted.

that provide the backing for `<node-port>`, unless those workloads' own policy
allows that.
3 changes: 1 addition & 2 deletions master/getting-started/openstack/installation/redhat.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,7 @@ These steps are detailed in this section.
### Install OpenStack

If you haven't already done so, install Openstack with Neutron and ML2
networking. Instructions for installing OpenStack on RHEL can be found
[here](http://docs.openstack.org/liberty/index.html).
networking.

### Configure YUM repositories

Expand Down
32 changes: 17 additions & 15 deletions master/reference/calicoctl/resources/policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,21 +47,23 @@ spec:

#### Spec

| Field | Description | Accepted Values | Schema | Default |
|------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-----------------------+---------|
| order | (Optional) Indicates priority of this policy, with lower order taking precedence. No value indicates highest order (lowest precedence) | | float | |
| selector | Selects the endpoints to which this policy applies. | | [selector](#selector) | all() |
| ingress | Ordered list of ingress rules applied by policy. | | List of [Rule](#rule) | |
| egress | Ordered list of egress rules applied by this policy. | | List of [Rule](#rule) | |
| doNotTrack | Indicates that the rules in this policy should be applied before any data plane connection tracking, and that packets allowed by these rules should not be tracked. | true, false | boolean | false |

The `doNotTrack` field is meaningful for [host
endpoints]({{site.baseurl}}/{{page.version}}/reference/calicoctl/resources/hostendpoint)
only. It does not apply at all to [workload
endpoints]({{site.baseurl}}/{{page.version}}/reference/calicoctl/resources/workloadendpoint);
connection tracking is always used for flows to and from those.

[Untracked policy]({{site.baseurl}}/{{page.version}}/getting-started/bare-metal/bare-metal) explains more about how `doNotTrack` can be useful for host endpoints.
| Field | Description | Accepted Values | Schema | Default |
|------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|-----------------------|---------|
| order | (Optional) Indicates priority of this policy, with lower order taking precedence. No value indicates highest order (lowest precedence) | | float | |
| selector | Selects the endpoints to which this policy applies. | | [selector](#selector) | all() |
| ingress | Ordered list of ingress rules applied by policy. | | List of [Rule](#rule) | |
| egress | Ordered list of egress rules applied by this policy. | | List of [Rule](#rule) | |
| doNotTrack | Indicates to apply the rules in this policy before any data plane connection tracking, and that packets allowed by these rules should not be tracked. | true, false | boolean | false |
| preDNAT | Indicates to apply the rules in this policy before any DNAT. | true, false | boolean | false |

The `doNotTrack` and `preDNAT` fields are meaningful only when applying policy to a
[host endpoint]({{site.baseurl}}/{{page.version}}/reference/calicoctl/resources/hostendpoint).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that doNotTrack was limited to host endpoints today. I guess (assuming that is true) then preDNAT would be ok to limit to host endpoints too, but it would be good to better understand the implementation cost of making these options supported for all endpoint types. I'm not advocating expanding the scope of this PR, but perhaps we should have a follow-on PR for consideration for Calico 2.5?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, doNotTrack has so far been limited to host endpoints. Also I'm pretty sure that the current requirement for preDNAT only needs application to host endpoints.

I think it would be a doable but non-trivial addition to do preDNAT for workload endpoints too -
so happy for considering this as a possible feature for 2.5. (Think we need to

  • make sure that relevant workload endpoint chains are in the mangle table
  • program workload dispatch chains into the mangle table
  • add rules into Calico's mangle PREROUTING chain to dispatch to the workload chains)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of thoughts:

  • There's a dataplane perf impact for every untracked rule that we add so I think there needs to be a clear requirement for workload untracked policy before we pay the per-packet cost for the dispatch chains.
  • It'd be hard to get good semantics for untracked policy for workloads. Since the raw PREROUTING chain is before the routing table and before NAT, we don't know whether the packets are workload packets at the point the rules are executed. Also, marking a packet as untracked takes it out of NAT so it breaks things like kube-proxy.

Only one of them may be set to `true` (in a given policy). If they are both `false`, or when applying the policy to a
[workload endpoint]({{site.baseurl}}/{{page.version}}/reference/calicoctl/resources/workloadendpoint),
the policy is enforced after connection tracking and any DNAT.

See [Using Calico to Secure Host Interfaces]({{site.baseurl}}/{{page.version}}/getting-started/bare-metal/bare-metal)
for how `doNotTrack` and `preDNAT` can be useful for host endpoints.

#### Rule

Expand Down
Loading