Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rack awareness, replication across availability zones #5738

Closed
solsson opened this issue Mar 30, 2018 · 5 comments
Closed

Rack awareness, replication across availability zones #5738

solsson opened this issue Mar 30, 2018 · 5 comments
Assignees

Comments

@solsson
Copy link

solsson commented Mar 30, 2018

Can minio instances be started with an arg/env that identifies the availability zone, so that replication spans as many zones as possible?

Expected Behavior

In a multi-zone cluster at for example Google or AWS, a minio cluster spanning multiple availability zones in the same region should replicate across zones as much as possible.

For example with 6 instances across 3 zones and standard class being EC:3, there should be one replica per zone.

Current Behavior

I don't know :)

Possible Solution

A startup script can look up the zone name from the machine, and provide it as arg or env to the container. I think config.json should stay the same across all minio instances in the cluster.

Context

We have this for Kafka, Yolean/kubernetes-kafka#41.

Your Environment

Kubernetes using the official Helm chart. At GKE.

@krishnasrinivas
Copy link
Contributor

krishnasrinivas commented Apr 4, 2018

@solsson it is hard to understand the description of the issue :-) I think you have misunderstood on the configuration of minio. We do not use the term "availability zone" in our documentation, I think you mean "region" setting?

Typically a minio cluster will be deployed such that the minio servers can span across racks, but never across data centers. If you want to replicate data in a minio cluster to another cluster in a different data center then you can use mc mirror github.com/minio/mc

@solsson
Copy link
Author

solsson commented Apr 4, 2018

@krishnasrinivas Thanks for trying anyway :) I tried to follow the issue template. Maybe this should start as a Slack discussion, but the problem with that is that others will not find it later.

Two use cases:

  • I have a Kubernetes cluster at GKE in region europe-west1, spanning three zones europe-west1-(a|b|c). There's 9 minio pods and standard storage class is 3. How do I reduce the risk that some blobs become unreachable if a single zone becomes unavailable, because those particular blobs happened to have all three replicas there?
  • At home I set up a minio cluster for long term storage of my digital data. I have four servers, two in "garage" and two in "basement". Network is fast, i.e. same region. Storage class is N/2, which if I've understood minio correctly means that I get two replicas (at least) of each file. How do I ensure that I get one replica in "garage" and one in "basement" so I don't lose my files if there's a fire in any of them?

The term "rack" is from Kafka: https://kafka.apache.org/documentation/#basic_ops_racks

@balamurugana
Copy link
Member

@solsson

I have a Kubernetes cluster at GKE in region europe-west1, spanning three zones europe-west1-(a|b|c). There's 9 minio pods and standard storage class is 3. How do I reduce the risk that some blobs become unreachable if a single zone becomes unavailable, because those particular blobs happened to have all three replicas there?

minio uses erasure coding to store data safely across multiple machines. You could refer https://docs.minio.io/docs/distributed-minio-quickstart-guide

What I am understanding from your question is, you would like to run DR site in another region(s). As @krishnasrinivas pointed out, you could do mc mirror command which mirrors one minio site to another in different region. This can be in as cron job to do automatically.

At home I set up a minio cluster for long term storage of my digital data. I have four servers, two in "garage" and two in "basement". Network is fast, i.e. same region. Storage class is N/2, which if I've understood minio correctly means that I get two replicas (at least) of each file. How do I ensure that I get one replica in "garage" and one in "basement" so I don't lose my files if there's a fire in any of them?

You could run distributed minio in machines at "garage" and "basement". This way erasure coding takes care of data high availability and fault tolerance.

@solsson
Copy link
Author

solsson commented Apr 10, 2018

Thanks. I'll read up on erasure coding. I falsely assumed that N/2 denoted a regular replica count, but in https://docs.minio.io/docs/minio-erasure-code-quickstart-guide I see that:

In 12 drive example above, with Minio server running in the default configuration, you can lose any of the six drives and still reconstruct the data reliably from the remaining drives.

where the word any is key.

@solsson solsson closed this as completed Apr 10, 2018
@nitisht nitisht removed the triage label Apr 10, 2018
@lock
Copy link

lock bot commented Apr 25, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Apr 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants