Releases · outbrain-inc/orchestrator

14 Oct 07:32

v1.4.440

5bbb55c

GA Release v1.4.440

Better error message in ReadTopologyInstance hint for the exact origin

of the problem

Assets 7

13 Oct 12:27

shlomi-noach

v1.4.438

a995d4b

GA Release v1.4.438

analysis & recovery:

- added DeadCoMasterAndSomeSlaves analysis
- better DeadCoMaster handling; now regrouping based on
GTID/Pseudo-GTID, attempting to promote co-master on top.

Assets 7

12 Oct 14:42

shlomi-noach

v1.4.437

e3601f4

GA Release v1.4.437

Better handling of DeadCoMaster. Previously this used same logic as for

DeadIntermediateMaster, but the two do not really behave in same way.
The correct way to recover a dead-co-master is to relocate all its
slaves to its still-working-co-master, and that's it.

Furthermore, post-failover scripts should be master-failover processes,
not intermediate-master failover processes. 
Recovery should only be automated if cluster applies for
automated-master-recovery, not automated-intermediate-master-recovery.

Assets 7

08 Oct 12:41

shlomi-noach

v1.4.428

0811321

GA Release v1.4.428

config: added PostponeSlaveRecoveryOnLagMinutes

postponed functions:
- introducing PostponedFunctionsContainer type (add to, invoke postponed
functions)
- MultiMatchBelow() accepts PostponedFunctionsContainer; on single-slave
buckets where the slave has SQLDelay, match of said slave is postponed
- RegroupSlaves() accepts PostponedFunctionsContainer to support the
above
- RegroupSlavesIncludingSubSlavesOfBinlogServers() accepts
PostponedFunctionsContainer to support the above
- TopologyRecovery includes PostponedFunctionsContainer
- RecoverDeadMaster uses TopologyRecovery as postponed functions
container upon RegroupSlavesIncludingSubSlavesOfBinlogServers()
  - effectively: SQL-delayed replicas are not blocking the entire
recovery operation but are postponed (see
PostponeSlaveRecoveryOnLagMinutes config).

Assets 7

07 Oct 19:08

shlomi-noach

v1.4.426

55a67d6

GA Release v1.4.426

web:

- fixed bug in cluster.js for virtual node (threw JS error)
- worked around that strange bug that empty array gets rendered by
golang/json as "null"

Assets 7

07 Oct 15:41

shlomi-noach

v1.4.425

f4f578b

GA Release v1.4.425

metrics:

- writing resolve_dao metrics

Assets 7

07 Oct 14:21

shlomi-noach

v1.4.424

7df9594

GA Release v1.4.424

recovery:

- introducing PostponedFunctions: may be added as recovery progresses;
issued after all PostRecovery* processes execute
- RecoverDeadMaster uses postponed functions for executing detach-slave
on lost slaves
- RecoverDeadMaster uses postponed functions for executing
begin-downtime on lost slaves, dead master
- RecoverDeadMaster uses postponed functions for executing repoint on
binlog servers in a binlog server topology

async_request:
- added, but currently UNUSED!
- since postponed-functions work well enough
- db: added async_request table
- added async_request.go, async_request_dao.go
- idea is to be able to request operations such as "relocate" etc. that
will be executed by the elected service asynchronously (implementing a
job queue)
- unsure when this will be used

metrics:
- Fixed bug in metrics callbacks not being called
- instance_dao writes some instance read/write metrics

Assets 7

07 Oct 06:59

shlomi-noach

v1.4.421

aef58f6

GA Release v1.4.421

db:

- added these columns to topology_recovery, making for a better audit:
  - participating_instances
  - lost_slaves
  - all_errors

recovery:
- More data in TopologyRecovery (see above); now class being used for
writes, as well
- dao reads & writes added columns in topology_recovery
- ResolveRecovery() uses recovery_id rather than host/port combination
- TopologyRecovery reference used throughout recovery process and holds
the state of things (BTW this is the most stateful data in orchestrator
at this time)
- TopologyRecovery object collecting errors as recovery makes progress,
for future visibility
- TopologyRecovery object collecting lost slaves as recovery makes
progress, for future visibility
- TopologyRecovery object collecting participating instances as recovery
makes progress, for future visibility

web:
- fixed bug in cluster.js leading to infinite loop on master-master
replication and loss of drag-n-drop functionality

Assets 7

24 Sep 13:51

shlomi-noach

v1.4.415

900aba4

GA Release v1.4.415

recovery:

- dao reads is_successful
- audit-recovery presents FAIL status for failed recoveries

Assets 7

18 Sep 14:15

shlomi-noach

v1.4.412

95dafc6

GA Release v1.4.412

Fixed resolve issue leading to unjustified fatals

Assets 7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: outbrain-inc/orchestrator

GA Release v1.4.440

GA Release v1.4.438

GA Release v1.4.437

GA Release v1.4.428

GA Release v1.4.426

GA Release v1.4.425

GA Release v1.4.424

GA Release v1.4.421

GA Release v1.4.415

GA Release v1.4.412