Releases: outbrain-inc/orchestrator
Releases · outbrain-inc/orchestrator
GA Release v1.4.440
Better error message in ReadTopologyInstance hint for the exact origin of the problem
GA Release v1.4.438
analysis & recovery: - added DeadCoMasterAndSomeSlaves analysis - better DeadCoMaster handling; now regrouping based on GTID/Pseudo-GTID, attempting to promote co-master on top.
GA Release v1.4.437
Better handling of DeadCoMaster. Previously this used same logic as for DeadIntermediateMaster, but the two do not really behave in same way. The correct way to recover a dead-co-master is to relocate all its slaves to its still-working-co-master, and that's it. Furthermore, post-failover scripts should be master-failover processes, not intermediate-master failover processes. Recovery should only be automated if cluster applies for automated-master-recovery, not automated-intermediate-master-recovery.
GA Release v1.4.428
config: added PostponeSlaveRecoveryOnLagMinutes postponed functions: - introducing PostponedFunctionsContainer type (add to, invoke postponed functions) - MultiMatchBelow() accepts PostponedFunctionsContainer; on single-slave buckets where the slave has SQLDelay, match of said slave is postponed - RegroupSlaves() accepts PostponedFunctionsContainer to support the above - RegroupSlavesIncludingSubSlavesOfBinlogServers() accepts PostponedFunctionsContainer to support the above - TopologyRecovery includes PostponedFunctionsContainer - RecoverDeadMaster uses TopologyRecovery as postponed functions container upon RegroupSlavesIncludingSubSlavesOfBinlogServers() - effectively: SQL-delayed replicas are not blocking the entire recovery operation but are postponed (see PostponeSlaveRecoveryOnLagMinutes config).
GA Release v1.4.426
web: - fixed bug in cluster.js for virtual node (threw JS error) - worked around that strange bug that empty array gets rendered by golang/json as "null"
GA Release v1.4.425
metrics: - writing resolve_dao metrics
GA Release v1.4.424
recovery: - introducing PostponedFunctions: may be added as recovery progresses; issued after all PostRecovery* processes execute - RecoverDeadMaster uses postponed functions for executing detach-slave on lost slaves - RecoverDeadMaster uses postponed functions for executing begin-downtime on lost slaves, dead master - RecoverDeadMaster uses postponed functions for executing repoint on binlog servers in a binlog server topology async_request: - added, but currently UNUSED! - since postponed-functions work well enough - db: added async_request table - added async_request.go, async_request_dao.go - idea is to be able to request operations such as "relocate" etc. that will be executed by the elected service asynchronously (implementing a job queue) - unsure when this will be used metrics: - Fixed bug in metrics callbacks not being called - instance_dao writes some instance read/write metrics
GA Release v1.4.421
db: - added these columns to topology_recovery, making for a better audit: - participating_instances - lost_slaves - all_errors recovery: - More data in TopologyRecovery (see above); now class being used for writes, as well - dao reads & writes added columns in topology_recovery - ResolveRecovery() uses recovery_id rather than host/port combination - TopologyRecovery reference used throughout recovery process and holds the state of things (BTW this is the most stateful data in orchestrator at this time) - TopologyRecovery object collecting errors as recovery makes progress, for future visibility - TopologyRecovery object collecting lost slaves as recovery makes progress, for future visibility - TopologyRecovery object collecting participating instances as recovery makes progress, for future visibility web: - fixed bug in cluster.js leading to infinite loop on master-master replication and loss of drag-n-drop functionality
GA Release v1.4.415
recovery: - dao reads is_successful - audit-recovery presents FAIL status for failed recoveries
GA Release v1.4.412
Fixed resolve issue leading to unjustified fatals