Distributed backup scheduling #1

hgfischer · 2017-10-17T14:48:21Z

CaOps should be able to receive an API call from every node, that will trigger a cluster-wide backup. This needs:

To be as much synchronized as possible, so to avoid network delays, the API will always schedule the backup to the nearest rounded time.
Under normal usage all Cassandra nodes must be up and running. No nodes must be joining or leaving.

Since we are not sure about how reliable Serf is on big clusters, we might need to implement something else to keep the scheduling consistent. Two options that I could think are:

Using Cassandra itself, with a table that has consistency of the number of nodes. This appears to be the most easy solution, but it has lots of trade-offs. The main one is that this table will be like a queue, which is an anti-pattern on Cassandra.
Using Hashicorp Raft library, so each CaOps agent has its own persisted state, that can also be more controlled for the special use-case.

hgfischer self-assigned this Oct 17, 2017

hgfischer added this to the 0.1.0 milestone Oct 17, 2017

hgfischer added architecture feature good first issue labels Oct 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed backup scheduling #1

Distributed backup scheduling #1

hgfischer commented Oct 17, 2017 •

edited

Loading

Distributed backup scheduling #1

Distributed backup scheduling #1

Comments

hgfischer commented Oct 17, 2017 • edited Loading

hgfischer commented Oct 17, 2017 •

edited

Loading