Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Fleet machine metadata for "environments" #41

Open
lukebond opened this issue Mar 23, 2015 · 19 comments
Open

Use Fleet machine metadata for "environments" #41

lukebond opened this issue Mar 23, 2015 · 19 comments

Comments

@lukebond
Copy link
Contributor

Let's say you want dev, QA, staging and production clusters. Rather than have multiple clusters of Paz, they could be the same cluster but use Fleet machine metadata to schedule units only on hosts containing units from their environment.

e.g. 4 environments, each a 3-node cluster, you may have the following metadata for them:

Host Name Metadata
host1 dev1 environment=dev
host2 dev1 environment=dev
host3 dev1 environment=dev
host4 qa1 environment=qa
host5 qa2 environment=qa
host6 qa3 environment=qa
host7 staging1 environment=staging
host8 staging2 environment=staging
host9 staging3 environment=staging
host10 prod1 environment=prod
host11 prod2 environment=prod
host12 prod3 environment=prod

More can be read about Fleet scheduling with metadata here: https://coreos.com/docs/launching-containers/launching/launching-containers-fleet/#schedule-based-on-machine-metadata

Credit to @rimusz for the idea.

@rimusz
Copy link

rimusz commented Mar 23, 2015

@lukebond
it is better not to mix production cluster with the rest.
production is production it needs to be kept away from the development cluster.

e.g. use of different coreos release channels.

@sublimino
Copy link
Contributor

@rimusz +1 for separating production environment - total isolation, nothing shared if possible

@rimusz
Copy link

rimusz commented Mar 24, 2015

@sublimino @lukebond
only private docker registry can be used for both, so docker images can be shared between clusters

@lukebond
Copy link
Contributor Author

although i agree with this, there shouldn't be anything in Paz that cares whether you separate them or not. Paz just needs to be aware of environments (ie. a parameter to most REST calls) and translate them down to Fleet machine metadata at deployment time.

although Paz avoids doing infra stuff, i'm beginning to think it would be good to have a cluster provisioning tool (separate from Paz) that allows you to choose Etcd cluster topology, group machines and add metadata, etc.

@rimusz
Copy link

rimusz commented Mar 24, 2015

@lukebond
Yes, Paz should not care how your dev/production is set. It is just good practice to keep them separately.

I think the separate cluster provisioning tool makes sense, which as you said: allows you to choose Etcd cluster topology, group machines and add metadata, etc.

@sublimino
Copy link
Contributor

That cluster provisioning tool(/GUI?) sounds like it could be a cloud-config generator via https://terraform.io/ - in support of immutable infrastructure we should deploy a new host with new config, health check, rebalance containers, and decommission old host? Servers should automatically be distributed between AZs where applicable.

etcd topology - for large deployments of any size CoreOS recommends running a separate 5-node etcd cluster, otherwise etcd should run on each host.

@rimusz
Copy link

rimusz commented Mar 24, 2015

@sublimino https://terraform.io/ it is good choice for cloud one setups. What about the bare-metal?

Regarding the etcd:
ever ever we should run etcd on each host, very bad idea, coreos does not recommend that.
I got bitten by that setup very badly.
I would recommend such setup:

  1. Up to 9 workers, one etcd
  2. then we can start from 3 etcd nodes for 10 up to 50 worker machines, then increase to 5 and so one.

Also etcd machines do not have to very powerful as they run only etcd cluster, e.g at GCE g1-small instances work just fine.
I had a long chat about it with Kelsey when he was at London Kubernetes meetup.

@lukebond
Copy link
Contributor Author

Agree with all of this and aware of the Etcd-on-every-machine anti-pattern from previous experience (and was also at that meet-up). But since Paz doesn't do infra then that's down to whoever sets up the cluster.

@rimusz
Copy link

rimusz commented Mar 24, 2015

@lukebond yep, it is more for the cluster provisioning tool, which makes sense to have for sure, to prepare cluster for Paz.

@sublimino
Copy link
Contributor

@rimusz if those bare-metal machines are accessible via ssh already we could conceivably rewrite the cloud-config file and reboot the server? Would have to ensure they're all on the same release channel.

Also been bitten by etcd 0.4 - hopefully we're fixed in v2, although not stressed it myself yet.

Read "on each host" above as "on three or fewer node clusters" - my concern with running less than three nodes is loss of resilience and the smallest machine breaking the cluster (AWS micro/small is not sufficient for etcd nodes). How much hand-holding should a provisioning tool do, @lukebond? And possibly it's another issue as I've hijacked this one! :)

As a footnote, the upper bound of etcd nodes required for stability across any cluster size is 5 according to a chat with Alex Polvi via some Chubby engineers. Further nodes add no meaningful resilience.

@rimusz
Copy link

rimusz commented Mar 24, 2015

@sublimino We can have a choice e.g. if somebody wants very small cluster of 3-5 nodes, they can have if they want just one etcd node, then 3 or 5 nodes depending on cluster size :-)
Yes, AWS ones micro/small are very bad, but Google g1-small (AWS small kind of) runs my etcd clusters just fine. This why I run away from AWS to GC.

@lukebond regarding this cluster provisioning tool, we need a separate repository under paz-sh.
I was looking forward to start messing with https://terraform.io/ for my small projects too, so there we can can put our brains to make a nice different clouds cluster provisioning tool, multi-clouds and etc.

@lukebond
Copy link
Contributor Author

@rimusz good idea. i took the liberty of choosing a name: https://github.com/paz-sh/clusterform

@rimusz
Copy link

rimusz commented Mar 24, 2015

👍

@sublimino
Copy link
Contributor

Splendid!

On 24 March 2015 at 17:47, Rimas Mocevicius [email protected]
wrote:

[image: 👍]


Reply to this email directly or view it on GitHub
#41 (comment).

@rimusz
Copy link

rimusz commented Mar 25, 2015

Will Paz support already provisioned clusters?
Maybe cloudform can be used there?

@lukebond
Copy link
Contributor Author

@rimusz currently that's all it supports. there are some helper scripts for bringing up a cluster (for testing/playing only really) but the idea is that you've already got your cluster and then you put Paz on it.

@rimusz
Copy link

rimusz commented Mar 25, 2015

If paz is going to use all that metadata stuff, some instructions need to be provided then, what metadata settings needs to be set on to current cluster to make paz to function properly

@lukebond
Copy link
Contributor Author

Yes, when we start using it. Currently there are no such requirements but there soon will be, e.g. for tying scheduler and service directory to a particular host (they're the ones that have a DB and therefore need a volume mount and to not move hosts). I've been doing that manually so far.

There will also be some metadata for environments and as you say that needs to be defined and documented.

@rimusz
Copy link

rimusz commented Mar 25, 2015

Cool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants