Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rebalancing failover tooling #6095

Conversation

fimanishi
Copy link
Member

What changed?
Excluded domains that have preferredClusters that are not present in its cluster list in ReplicationConfiguration

Why?
Domains with preferredClusters not in their cluster list were causing the workflow to panic when trying to acquire the client for the preferredCluster from the remoteFrontendClients

How did you test it?
unit tests and local replication tests

Potential risks
The workflow was generally not working before because having a preferredCluster not in the domain cluster list is a normal scenario. The risk introduced is the workflow actually executing and doing something unexpected. For now, we just rebalance domains that have preferredCluster set and that can be rebalanced back to their preferredCluster

Release notes

Documentation Changes

**What changed?**
Excluded `domains` that have `preferredClusters` that are not present in its `cluster` list in `ReplicationConfiguration`

**Why?**
`Domains` with `preferredClusters` not in their `cluster` list were causing the `workflow` to panic when trying to acquire the `client` for the `preferredCluster` from the `remoteFrontendClients`

**How did you test it?**
unit tests and local replication tests

**Potential risks**
The `workflow` was generally not working before because having a `preferredCluster` not in the `domain` `cluster` list is a normal scenario. The risk introduced is the `workflow` actually executing and doing something unexpected. For now, we just rebalance `domains` that have `preferredCluster` set and that can be rebalanced back to their `preferredCluster`

**Release notes**

**Documentation Changes**
Copy link

codecov bot commented Jun 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.15%. Comparing base (f0f7efd) to head (394650a).
Report is 11 commits behind head on master.

Additional details and impacted files
Files Coverage Δ
...rvice/worker/failovermanager/rebalance_workflow.go 100.00% <100.00%> (ø)

... and 15 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f0f7efd...394650a. Read the comment docs.

@fimanishi fimanishi merged commit 04662c6 into cadence-workflow:master Jun 5, 2024
20 checks passed
@fimanishi fimanishi deleted the fix-rebalancing-failover-tooling branch June 5, 2024 22:18
timl3136 pushed a commit to timl3136/cadence that referenced this pull request Jun 6, 2024
**What changed?**
Excluded `domains` that have `preferredClusters` that are not present in its `cluster` list in `ReplicationConfiguration`

**Why?**
`Domains` with `preferredClusters` not in their `cluster` list were causing the `workflow` to panic when trying to acquire the `client` for the `preferredCluster` from the `remoteFrontendClients`

**How did you test it?**
unit tests and local replication tests

**Potential risks**
The `workflow` was generally not working before because having a `preferredCluster` not in the `domain` `cluster` list is a normal scenario. The risk introduced is the `workflow` actually executing and doing something unexpected. For now, we just rebalance `domains` that have `preferredCluster` set and that can be rebalanced back to their `preferredCluster`

**Release notes**

**Documentation Changes**
timl3136 pushed a commit to timl3136/cadence that referenced this pull request Jun 6, 2024
**What changed?**
Excluded `domains` that have `preferredClusters` that are not present in its `cluster` list in `ReplicationConfiguration`

**Why?**
`Domains` with `preferredClusters` not in their `cluster` list were causing the `workflow` to panic when trying to acquire the `client` for the `preferredCluster` from the `remoteFrontendClients`

**How did you test it?**
unit tests and local replication tests

**Potential risks**
The `workflow` was generally not working before because having a `preferredCluster` not in the `domain` `cluster` list is a normal scenario. The risk introduced is the `workflow` actually executing and doing something unexpected. For now, we just rebalance `domains` that have `preferredCluster` set and that can be rebalanced back to their `preferredCluster`

**Release notes**

**Documentation Changes**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants