Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: add openshift on-demand cluster tests #1816

Merged
merged 145 commits into from
Jul 4, 2024
Merged

Conversation

leiicamundi
Copy link
Contributor

@leiicamundi leiicamundi commented May 17, 2024

Which problem does the PR fix?

This PR addresses the limitation in our current CI pipeline for the Helm chart repository, which only tests against a single version of OpenShift. By modifying the pipeline to test against multiple OpenShift versions, we aim to enhance compatibility and ensure the Helm charts work seamlessly across different OpenShift environments.

What's in this PR?

In this PR, we have made the following changes:

  • Modified the Helm chart CI pipeline to incorporate testing against multiple versions of OpenShift.
  • Implemented necessary adjustments to the CI configuration to enable testing across different OpenShift versions.
  • Ensured that the CI pipeline passes tests for all integrated OpenShift versions without requiring manual intervention.

Additionally, as part of this PR, the following secrets were added:

  • DISTRO_CI_REDHAT_CONSOLE_TOKEN: Token to add the cluster on the Red Hat console.
  • DISTRO_CI_AWS_ACCESS_KEY, DISTRO_CI_AWS_SECRET_KEY, DISTRO_CI_AWS_PROFILE: Credentials to authenticate with AWS.
  • DISTRO_CI_OPENSHIFT_TFSTATE_BUCKET: Storage for Terraform state files.
  • DISTRO_CI_ON_DEMAND_EXTERNAL_DNS_GCP_SERVICE_ACCOUNT, DISTRO_CI_ON_DEMAND_CERT_MANAGER_GCP_SERVICE_ACCOUNT : GCP DNS Service Accounts

Checklist

Before opening the PR:

  • Ran make go.update-golden-only in the repo's root dir.
  • Verified that no other open pull requests address the same update/change.
  • Added tests for charts, if needed.
  • Updated in-repo documentation, if necessary.

After opening the PR:

  • Signed the CLA (Contributor License Agreement).
  • Ensured all checks/tests pass in the PR.

To-Do

  • Discuss the approach further if the PR is not complete.
  • Address any feedback or suggestions provided during the review process.

@leiicamundi leiicamundi self-assigned this May 17, 2024
@leiicamundi leiicamundi added platform/openshift Issues related to OpenShift labels May 24, 2024
@leiicamundi
Copy link
Contributor Author

Hey there!

Overview

For this feature implementation, we needed to extract the test logic into a reusable action called chart-test-recipes.

Key Changes

  1. Reusable Action:

    • Created chart-test-recipes to handle test logic.
  2. Updated Existing Action:

    • Modified /.github/actions/workflow-vars to include missing environment values required for the tests.
    • Added outputs for easier use.
  3. Test Template:

    • Updated to call chart-test-recipes.
    • Added a safeguard to ensure there's always an identifier, which is handy for debugging when triggered from a commit.

Technical Challenges

  1. On-Demand Cluster Configuration:

    • Clusters need a configuration similar to existing permanent clusters.
    • Created fixtures/clusters/rosa-hcp-on-demand, copying the standard cluster configuration from the private camunda/distribution repo.
    • Added ServiceAccounts for cert-manager and external DNS interaction to secrets.
  2. Dedicated ROSA Test Workflow:

    • Steps include:
      • Preparing for cluster creation via matrices.
      • Creating clusters.
      • Standardizing cluster configuration.
      • Running various tests on each cluster.
      • Cleaning up and generating error reports.
  3. Scheduling and Reporting:

    • Tests run every other day with a failure report generated.
    • Daily cleanup of orphaned clusters, generating a report if it fails (daily-cleanup-rosa.yml).
  4. Matrix Implementation:

    • ROSA cluster matrix referenced in configs/tests-integration-rosa-matrix.yml.
    • Necessary for referencing the matrix across multiple jobs, which isn't possible without a file.
    • Used cloudposse/github-action-matrix-outputs-write for handling matrix outputs (see GitHub Community Discussion).
    • Kubeconfig is encrypted with the GitHub token of the action to ensure it's never exposed.

Summary

  • Restructured to support on-demand cluster management.
  • Scalable approach for future integration with other clusters like EKS.

Testing and Results

Thank you in advance for the review!

@leiicamundi leiicamundi marked this pull request as ready for review May 28, 2024 19:06
@leiicamundi leiicamundi requested a review from Langleu May 28, 2024 19:06
@leiicamundi
Copy link
Contributor Author

leiicamundi commented Jun 28, 2024

Hi @aabouzaid,

I have updated the branch to rebase it on main, reintegrating the features that were added in the meantime.

I also took the opportunity to add comments on the points we discussed.

Regarding the generation of GH tokens, it has been extracted into the calling workflows as requested.
For the cluster setup, it has now been extracted into the distribution repo. Can you please review the following 2 PRs which are dependencies of this PR: https://github.com/camunda/distribution/pull/277; https://github.com/camunda/distribution/pull/266

Slack alerting has been added, but we will need to test it once we agree on the content of the PR.

Please wait before merging until I have tested after the review (as I need the PR of the IaC to be merged for that): Slack tests and ROSA on-demand integration tests

The standard tests are green, hopefully nothing broke :)

Thank you in advance!

@aabouzaid
Copy link
Member

@leiicamundi Thanks a lot for the split, it uses the same methods we use in the team and is much easier to follow and review.
I've reviewed the 2 Distro PRs and left a couple of comments there 👍

I just have a question about cloning the cluster resources from the Distro repo.
It's not clear to me where that will happen. 🤔

@leiicamundi
Copy link
Contributor Author

Hi @aabouzaid, thanks for the reviews.

The clone of the distribution repo is referenced here

Then, the next step proceeds to the template of the values and cluster configuration

@leiicamundi
Copy link
Contributor Author

@aabouzaid, can you do a final review before the merge?
I have rebased the branch on main.

However, I noticed that the upgrade tests are taking more time and resources than they did a month ago. I had to increase the cluster by 4 CPUs. Is there a reason for this?

I observed the same phenomenon in the integration tests on the permanent clusters. Looking at the history, we can see that it used to take 10 minutes (https://github.com/camunda/camunda-platform-helm/actions/runs/9408861180/job/25917664656) compared to 16 minutes today (https://github.com/camunda/camunda-platform-helm/actions/runs/9799006610/job/27058448099?pr=1816, https://github.com/camunda/camunda-platform-helm/actions/runs/9782418024/job/27008671740).

@aabouzaid
Copy link
Member

@aabouzaid, can you do a final review before the merge? I have rebased the branch on main.

However, I noticed that the upgrade tests are taking more time and resources than they did a month ago. I had to increase the cluster by 4 CPUs. Is there a reason for this?

I observed the same phenomenon in the integration tests on the permanent clusters. Looking at the history, we can see that it used to take 10 minutes (https://github.com/camunda/camunda-platform-helm/actions/runs/9408861180/job/25917664656) compared to 16 minutes today (https://github.com/camunda/camunda-platform-helm/actions/runs/9799006610/job/27058448099?pr=1816, https://github.com/camunda/camunda-platform-helm/actions/runs/9782418024/job/27008671740).

No idea, but we will investigate it later.

Thanks for the changes, looks good to me 🙌
I will merge it shortly.

@aabouzaid aabouzaid merged commit 4c90153 into main Jul 4, 2024
3 of 7 checks passed
@aabouzaid aabouzaid deleted the feature/openshift-tests branch July 4, 2024 22:24
@aabouzaid aabouzaid changed the title feat(openshift): ci on-demand cluster openshift tests ci: add openshift on-demand cluster tests Jul 4, 2024
@aabouzaid aabouzaid added this to the 8.6 Release Cycle milestone Jul 4, 2024
@aabouzaid aabouzaid added size/m Relative effort/time: Medium cycle/alpha4 Tasks will be done in alpha4 cycle labels Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci cycle/alpha4 Tasks will be done in alpha4 cycle platform/openshift Issues related to OpenShift size/m Relative effort/time: Medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants