Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pipeline to restart SDF/Rebaser/Pinga units on relevant nodes #5130

Merged
merged 1 commit into from
Dec 13, 2024

Conversation

johnrwatson
Copy link
Contributor

@johnrwatson johnrwatson commented Dec 13, 2024

Adds a new pipeline which we can trigger that will restart the service replicas that are impacted by our current run-away memory issue. It will basically quickly drop the replicas and bring them back. For SDF it pushes the FE into maintenance mode first to give a bit of better user experience, all the other services are gracefully stopped so they should finish up their current work before attempting to be started again.

In theory, it's much better to take just one node offline at a time, but it would require a fair bit more re-work in our toolbox to let us do it.

We can schedule this like a maintenance event (using the maintenance mode) and it should be very minimally impactful for the users.

This is untestable until I merge it, because the workflow doesn't exist in the UI/API until it's present on main at least once

@github-actions github-actions bot added the A-ci Area: CI configuration files and scripts label Dec 13, 2024
@johnrwatson johnrwatson changed the title first attempt Add pipeline to restart SDF/Rebaser/Pinga units on all nodes Dec 13, 2024
@johnrwatson johnrwatson changed the title Add pipeline to restart SDF/Rebaser/Pinga units on all nodes Add pipeline to restart SDF/Rebaser/Pinga units on relevant nodes Dec 13, 2024
@johnrwatson johnrwatson force-pushed the feat/add-unit-restart-for-memory-management branch 2 times, most recently from d5c8f71 to 4ade1e1 Compare December 13, 2024 20:46
@johnrwatson johnrwatson marked this pull request as ready for review December 13, 2024 20:48
jobs:

restart-rebaser:
uses: ./.github/workflows/instance-refresh.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean to use the instance-refresh here or the new service-restart you wrote?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh goodness, great catch

@johnrwatson johnrwatson force-pushed the feat/add-unit-restart-for-memory-management branch from 6af74a1 to aeba0ea Compare December 13, 2024 22:24
@johnrwatson johnrwatson added this pull request to the merge queue Dec 13, 2024
Merged via the queue into main with commit f2e86ce Dec 13, 2024
7 checks passed
@johnrwatson johnrwatson deleted the feat/add-unit-restart-for-memory-management branch December 13, 2024 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ci Area: CI configuration files and scripts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants