-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DESIGN: Services v2 #1107
Comments
In Options 2) and 3):
s/that is not/that is/ ? |
@proppy fixed |
These are listed as cons but they can be pros. Would list the former as requiring a new API object (which we might want to do anyway). I think that the second one is technically a pro because the user can rebind a port for their own reasons. |
I assume the "ambassador definition" being added to a pod definition has merit on its own - in that services are "magic" and clients can't define their own names for those env vars, whereas with an ambassador def you can choose the port and the env vars that get used. |
@smarterclayton re "can be pros". I listed them as cons because they add time and complexity as prerequisites. Will reword. |
I think it's clear we want to move to private ambassadors, but it is extra work and complexity that we can defer in some of these options. Specifically, we can move from option 1 to option 2 easily. Option 3 requires more up-front work. |
Besides load balancing, is there any other use cases for service object? Is it possible a set of pods are used to declare more than one service? For example, a pod could declare a service for end users, meanwhile provides admin service? If that is the case, in the models with private ambassadors, is it required two kube-proxies to be added to the pod to listen on those Portal IP? or still single one, but listen on both Portal IPs? |
I vote for option one. Where is the reasoning for private ambassadors? They sound like an anti-feature to me. I think that the proxy being multi-tenant is a feature, not a bug. I don't think we should force users to change their pod specs to deal with our weird IPTables magic because we're hacking together IP-per-pod. (Note that I don't think proxy should stay multi-tenant, because I think it should go away--I'd prefer to see us rejigger the network layer such that it's just plain not necessary. And if we're going to attempt that feat, then we definitely shouldn't be forcing users to predeclare service usage and add slots for ambassadors.) |
You can imagine services that are not load balanced but sharded or master Yes, pods can be in more than one service, though I am not sure why. You
|
Multi-tenant is one reason to go away from single proxies. The other is We actually CAN make the network magic enough for simple round robin, but
|
@lavalamp What parts of private ambassadors are anti-feature to you? Having to have a real container in your pod that's acting as the ambassador (vs. having the infrastructure provide a virtual ambassador)? If so I agree on that point - I don't think we should require modeling relationships / dependencies as actual proxies in containers. The default ambassador pattern should be whatever can provide the most flexibility / reliability out of the list above. A real (actual in a container) ambassador is something you can always add to your pod if you want - bonus points if you can easily parameterize the ambassador container from the same info the infra uses. It may also be of value to enable types of ambassadors for the user to select from - even if the first implementation is one of the above, I'm not so sure that being able to have the ambassador qualify its needs (I need PTR records to work, I need secure TLS tunneling, I need this traffic to be high bandwidth) won't be valuable. It's then up to the infrastructure / plugins to satisfy those needs if it can.
Are you saying that pods (which are explicitly Kubernetes concepts) shouldn't declare what they depend on, but should automatically have dependencies injected by the existence of a service? I'd argue that it's very important to enable authors to use images that are not dependent on Kubernetes concepts (isolated from knowing about the environment they run in). A part of that is enabling pod creators to declare how a dependency is manifest. An example is saying:
|
Some comments:
|
This was one of the original mission goals of libswarm - to offer a discovery endpoint within a container that was standard to all Docker runtime environments that arbitrary code could introspect. The host environment could then offer arbitrary service discovery as well. I think it's a good objective and fits in with what the Docker ecosystem can accomplish by setting conventions. |
On Thu, Sep 4, 2014 at 2:23 PM, Joe Beda [email protected] wrote:
Added JSON for services and YAML for pods. Does this capture what you
Commented on this
commented
That's true, but I am trying to cover the more common case in this |
More on why private ambassadors are an anti-feature: Eventually, it seems like the awesomest way to end up is with the service IP being a real IP address that the network fabric understands and not IPTables magic. In that case, the ambassador becomes needless bloat. So let's not ever introduce it if possible. (IPTables magic is an anti-feature IMO and should not be the intended end state.) If multi-tenancy is the issue, and true network isolation is required, that should be accomplished by putting every tenant on their own isolated virtual network, not via options 2 or 3, which are security through obscurity if I understand it correctly. Additionally now that I see some JSON for this, it seems arcane to force the user to specify the port in both service definition and also repeat the port number in their pod in order to use the service. Port number should be passed to the pod along with the IP address via whatever discovery method we come up with. With IP-per-service port number effectively doesn't matter. |
@thockin Thanks for the YAML/JSON - it helps a lot. On 3: would we need a portal.targetPort in the pod definition? More questions:
|
@lavalamp if we end up with ambassadors/portals that are "thicker" in terms of implementing some protocol specific logic, we'll want that to be versioned and run like other user code. I think we can split this up different ways:
My gut is to keep things simple for now and so I like (3) but I can understand the desire to put the portal/ambassador/proxy/forwarder on its own IP. |
@jbeda The "target" port is captured by the extant Service.containerPort Multiple portals to the same service: As currently spec'ed, portals target "Type" would be a property of the Service, I think, though you could maybe As we open the door to different kinds of policies, I think we will need to On Fri, Sep 5, 2014 at 12:30 PM, Joe Beda [email protected] wrote:
|
On Fri, Sep 5, 2014 at 11:01 AM, Daniel Smith [email protected] wrote:
longer term: Some cloudproviders will provide "true" VIPs for internal Some cloudproviders will not, which forces us to schedule an haproxy In neither case is there a need for a real process acting as The decision of whether clients should declare their desire to be able
It's not through obscurity by necessity, it could be implemented as a
Some hypothetical pod p3 would not be able to connect to s1, nor would
That's a bug - there is no user-visible service port in this model. Fixed.
|
I more or less have kept my opinion to myself, but here's my thoughts. I think option 3 is marginally simpler over all, but requires a lot more For that reason I think #1 is a better solution for right now, with growth On Fri, Sep 5, 2014 at 1:27 PM, Joe Beda [email protected] wrote:
|
We already know this is a customer requirement we plan to satisfy with ovs and private vlans (or similar), where namespaces/projects/policy defines a set of routing rules that allow subdivision of the internal kube network. It's primarily for medium trust multitenant environments where you want shared resources but an extra layer of defense in the event of a container compromise. |
@thockin I actually think that option 3 is pretty easy to get done:
|
That's still notably more work than option 1, with deeper impact on users
|
We should do a quick discussion whether container predeclaration of dependencies is the right pattern, make the pros and cons clear, and try to set a direction. That's probably orthogonal to this discussion, but I think it's more important to what we're trying to achieve with building containerized applications than some of the mechanical details of how interconnections should work (this issue, no offense meant Tim), and I don't get the feeling from this thread that we have total consensus. I don't think we have an issue to contain it but I'm fine spawning one separately. Services are globally injectable today, and clients (pods) cannot mutate the form they appear in the container, nor can they adapt the logical entity the service represents (collection of pods) to their needs. They intermediate pods from knowing about changes to other pods, but pods aren't the only thing that a client container might want intermediated (things outside Kube, external IPs or DNS). Services are global but that only scales to a fairly low bound, and it's unlikely all pods in the scope care about all services but must worry about collision. Some of the solutions to those issues potentially reduce the migration / use pattern change we are worried about between 1 and the others.
|
Related topic: do we have multiple protocols/ports specified in a service target spec? See #1205. |
One thing I'd like to make sure we have out of the next version of services (based on discussions on IRC) is that an application referencing a service can come up before that service is defined. |
Agree that ordering should be irrelevant. Agree we need some consensus on That said, I think we can move forward with this proposal option 1, and On Mon, Sep 8, 2014 at 9:03 AM, Joe Beda [email protected] wrote:
|
Do we have confidence that we can make this work in every network environment? If so, I'm okay with doing (1) to start. |
I'm against option (3) of using portals on the localhost IP to connect to services. One reason is that I can't keep a 1:1 mapping of ports in the general case. Consider the case where I have multiple services running MySQL, one with a user database and another with a comments database, both are running in separate pools of pods and each MySQL server is listening on port 3306 of their own pod. I'd like to be able to connect to userdb:3306 to get to the users database and to commentsdb:3306 to get to the comments database. But if I'm using localhost portals, then I can't have both of them use port 3306, which means I have to start using non-standard ports. Another problem with start depending on localhost portals is that it doesn't scale, as soon as I want to scale out my service and use a real load balancer to my MySQL hosts, I want to be able to connect to the real IP of the load balancer directly, in which case using a localhost portal only creates the need to keep networking magic around when it's no longer doing any useful work... I'd say aim for this:
Re (1) or (2) I don't have a strong preference, I see both of them as solutions for "toy" setups so I'm not sure they deserve a lot of consideration. I think I lean towards (2) since then the private proxy can run on the same pod/machine and it doesn't require coordination of external resources. |
Agree portals shouldn't force you to localhost, or to change ports. I don't think that forcing localhost portals should be part of the solution. However, localhost:3306 is strictly better than <random_ip>:3306 for use cases where you want to connect to a single database. |
If localhost portals are optional then you can just change your pod config to drop them. How those portals are exposed into your code shouldn't have to change. EDIT of EDIT: you're advocating using service ip directly, I think that's valuable for some cases, but the value in localhost is that for most apps in most cases you don't need to do anything to your code, whereas for service IP you still have to configure your pod / code to connect to them. |
I'm 99.5% sure that these portal IPs never touch the wire. The only On Tue, Sep 9, 2014 at 11:17 AM, Joe Beda [email protected] wrote:
|
Regarding "toy" solutions - i think we'll find it sufficient for a large On Tue, Sep 9, 2014 at 11:33 AM, Filipe Brandenburger <
|
Added a short note on decision. Will flesh out the text and go back to implementation and naming bikeshedding. |
With a shared ambassador (or suitably privileged private ambassadors) running on the local host, there should be no need to actually modify the packet headers and adding DNAT rules will just double the amount of connection tracking performed by the local kernel. Add the service IPs to the right veth interfaces/namespaces (or mess with "local" routes in the routing table) and the ambassador should be able to bind to the right IP+ports and just listen/reply directly. |
#1402 is in. Closing this doc now, though I am sure we will revisit it when private ambassadors come up next. |
Iptables DNAT to load balancer in option 3 will not work: you can't DNAT localhost traffic to another host. Even if you somehow manage to do it using policy routing rules, changing the "local" routing table and such (though I couldn't and wan't able to find someone who could), it will still be unsupported solution which can break in the next kernel release. |
fix integration tests always passing because of obscure golang variab…
This issue is still linked from here in the kubernetes documentation |
…ace-49 Bug 2039373: UPSTREAM: 89885: SQUASH: Retry fetching clouds.conf
Goal
To evaluate options for enhancing the Kubernetes Service abstraction.
Non-Goals
To discuss external IPs bridging into kubernetes clusters. To quibble about names (not yet).
Background
The kubernetes Service abstraction defines a group of pods that can be accessed through a single IP and port, with a policy describing how to access the pods. For example, when a client connects to a service’s IP:port (which it finds through environment variables), a local proxy will round-robin accesses to the constituent pods. This is the only policy supported today, but we envision more before too long. For example, one can easily imagine “real” load-balanced services which have an HAProxy (or similar) in front of them with a real pod IP.
Today this is implemented as a per-minion proxy process which listens on the minion’s primary IP for every service in the cluster, on each service’s port. To be clear: the IP assigned to a service is the IP of the minion the caller is running on. This is exposed as an environment variable, but is effectively a constant. This has a number of drawbacks. First, the proxy is inherently multi-tenant and its resources are not charged to any pod. Second, it forces all services in a cluster to have different port numbers - if any service tries to use a previously consumed port it will fail, but this can not be known a priori. Third service ports potentially collide with any pods that use HostPorts and any daemons that run on the minions. Fourth, environment variables can not be dynamically updated, so running pods can not know about services started after the pod itself. Fifth, should a pod ever live-migrate it may have to take two network hops to reach a service instead of one, and will be forever subject to the availability of the first minion on which it ran.
For these reasons, I think we can do better. Kubernetes should take the stance that we NEVER make users concern themselves with shared port spaces unless they own all of the shared elements. We started down this path with IP-per-Pod semantics. Service ports are the last violation of this principle.
There is an orthogonal concern that impacts this design. Today, any pod can access any service. It is not part of any API to be able to specify which services a pod might want to connect to. There have been some arguments that we might want to make that part of the API. For example: “this pod will want to connect to the service named ‘foo’”.
Design
I see a few options that could make this system more elegant.
Terminology
Pod: A kubernetes pod, running 1 or more containers (no special meaning above the normal definition)
Service: A group of Pods, as determined by a label selector, which all offer a common port name/number that serves a single purpose. The canonical example is a pool of HTTP servers that all have the same content available. Services can conceptually have different policies for accessing them, but the only one implemented today is load-balanced.
Ambassador: A piece of executable logic, hosted somewhere, which understands kubernetes label selector groups and implements the policy for a Service. This might be in a cloud-provider service, or in a standalone pod (e.g. an enlightened haproxy), or in a per-node shared process (e.g. kube-proxy). The Ambassador is how clients access a Service (kubernetes-native apps may choose to link or implement an Ambassador directly into their app).
Portal: A stable IP:port pair which grants access to an Ambassador. When a client connects to a Portal, the packets are transported to the Ambassador without the client needing to understand how the Ambassador is implemented. "Stable" means that neither the IP nor port can change for the lifetime of a client of the Service.
Option 1) IP-per-Service, shared ambassador
When a Service is created, we allocate an IP from a special range of IPs. This IP is the portal IP. We broadcast this Service, along with its port number, to all of the (one-per-minion) kube-proxy instances along with the Portal IP. The kube-proxy sets up iptables rules to “steal” traffic to the Portal [IP, port], and redirect it back to itself on a random/ephemeral port. The kube-proxy acts as the Ambassador (the same as today) - and will round-robin traffic across the constituents of the Service.
JSON for a service:
Client pseudocode:
Pros:
Cons:
Option 2) IP-per-Service, private ambassador
Similar to option 1, a Portal IP is allocated for each Service. Unlike option 1, though, the Ambassador is private* to each Pod. This requires that Pods declare which Services they want to access up front (or else the kubelet or other root-namespace, true-root user agent will need to change into each pod namespace for each service add/remove in the whole cluster [iptables rules require true root] -- I assume this is a non-starter), so that the iptables rules can be established in the Pod namespace.
(*) "private" could mean "runs in" for now, but it is really more abstract. If we have different "kinds" of services, some might have real load-balancers as Ambassadors, so the Portal would be just an iptables forwarding rule.
JSON for a service:
YAML for a pod:
Client pseudocode:
Pros:
Cons:
Option 3) Localhost portals, private ambassadors
Instead of allocating an IP for each Service, this option requires that Pods declare which Services they want to access up front, and that they specify an unused port number on localhost which will become the Portal. Like option 2, the Ambassador will be private to each pod, which could mean running a kube-proxy or configuring iptables or other implementations.
YAML for a pod:
Client pseudocode:
Pros:
Cons:
Decision
We are going to pursue option 1 in the short term. It solves the problem of port collisions without requiring all users to pre-declare their needed Services. We will probably proceed to option 2 or even option 3 (or some combination thereof) later.
Notes for later work:
If we implement “real load balanced” services in options 1 or 2, we need to DNAT portal traffic to the load-balancer IP. The iptables to steal traffic and redirect to a different IP:
If we implement “real load-balanced” services in option 3, we need to DNAT localhost traffic to the portal to the load-balancer IP. The iptables to steal localhost traffic and redirect to a different IP (has to run in-namespace):
The text was updated successfully, but these errors were encountered: