Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework CI/CD #2037

Open
6 tasks
asafpamzn opened this issue Jul 28, 2024 · 1 comment
Open
6 tasks

Rework CI/CD #2037

asafpamzn opened this issue Jul 28, 2024 · 1 comment
Assignees
Labels
CI/CD CI/CD related Tech debt
Milestone

Comments

@asafpamzn
Copy link
Collaborator

asafpamzn commented Jul 28, 2024

CI platform has became a major limiting factor for a robust development, as was evident during 1.2 release.
As the amount of supported configurations grow, having the full CI test completing in 20+ hrs is unacceptable.
The primary weak point is the current implementation of a self-hosted runner which is a single EC2 instance, managed manually. This approach is obviously not scalable.
Instead, we should consider using docker-based approach with AWS fargate service.
In addition, we should consider doing so not only for the tests running on self hosted, but for all the tests since it will potentially be more scalable (bypassing github concurrency limitations).

Additional improvements to consider

redis-rs CI is taking too much time to finish — need to be shortening and improved:

  • Strap unneeded cases: features, sync tests, versions tests etc.
  • Fix usage of hard codded ports for server and run in multithreaded.
  • Run tests with cargo nextest

Glide enhancement

  • Using cache Rust builds for non Rust changes PR's.
  • Some language build and test can be replaced by much faster tools, replacing them will make Glide CI and CD faster.
  • Moving macOS runner to be a container of macOS on Ubuntu, saving CI costs. Can be done easily using services field of a job.

Also to consider #2544

@asafpamzn asafpamzn added this to the node-GA milestone Jul 28, 2024
@asafpamzn asafpamzn moved this from OPS to In Progress in Valkey-GLIDE - internal Aug 4, 2024
@avifenesh avifenesh changed the title CI/CD improvments CI/CD improvments - NTH Aug 18, 2024
@asafpamzn asafpamzn modified the milestones: node-GA, Post GA Aug 25, 2024
@avifenesh avifenesh changed the title CI/CD improvments - NTH CI/CD improvments - lists Oct 11, 2024
@avifenesh avifenesh changed the title CI/CD improvments - lists CI/CD improvments - list Oct 11, 2024
@avifenesh avifenesh removed this from the Post GA milestone Oct 11, 2024
@avifenesh avifenesh added CI/CD CI/CD related 1_3_candidate labels Nov 9, 2024
@ikolomi ikolomi changed the title CI/CD improvments - list Rework CI/CD Nov 28, 2024
@ikolomi ikolomi assigned ikolomi and unassigned avifenesh Nov 28, 2024
@asafpamzn asafpamzn added this to the 1.3 milestone Dec 25, 2024
@ikolomi
Copy link
Collaborator

ikolomi commented Dec 30, 2024

The first issue that need to be addressed is the self hosted runner - it is the main bottle neck since the jobs run sequentially and often requires manual maintenance.

The alternative which is currently being studied is switching to kubernetis cluster that is supported by github actions.
Regardless of salable compute infra that will be eventually selected (EKS, ECS) the scaling requires updating the workflow files since they are written with assumptions that the jobs are running on the same host.
EE 14d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD CI/CD related Tech debt
Projects
Status: In Progress
Development

No branches or pull requests

3 participants