Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phd: add basic "migration-from-base" tests + machinery #609

Merged
merged 23 commits into from
Jan 21, 2024

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Jan 16, 2024

In order to ensure that changes to propolis don't break instance
migration from previous versions, we would like to add automated testing
of migrating an instance from the current master branch of propolis
to run on PR branches. This PR adds an implementation of such a test to
phd.

To implement this, I've built on top of my change from PR #604 and
modified the phd artifact store to introduce a notion of a "base"
Propolis server artifact. This artifact can then be used to test
migration from the "base" Propolis version to the revision under
test. I've added a new test case in migrate.rs that creates a source
VM using the "base" Propolis artifact and attempts to migrate that
instance to a target VM running on the "default" Propolis artifact (the
revision being tested). In order to add the new test, I've factored out
test code from the existing migrate::smoke_test test.

How phd should acquire a "base" Propolis artifact is configured by
several new command-line arguments. --base-propolis-branch takes
the name of a Git branch on the propolis repo. If this argument is
provided, PHD will download the Propolis debug artifact from the HEAD
commit of that branch from Buildomat. Alternatively, the
--base-propolis-commit argument accepts a Git commit hash to
download from Buildomat. Finally, the --base-propolis-cmd argument
takes a local path to a binary to use as the "base" Propolis. All
these arguments are mutually exclusive, and if none of them are
provided, the migration-from-base tests are skipped.

When the "base" Propolis artifact is configured from a Git branch
name (i.e. the --base-propolis-branch CLI argument is passed), we
use the Buildomat /public/branch/{repo}/{branch-name} endpoint, which
returns the Git hash of the HEAD commit to that branch. Then, we attempt
to download an artifact from Buildomat for that commit hash. An issue
here is that Buildomat's branch endpoint will return the latest commit
hash for that branch as soon as it sees a commit, but the artifact for
that commit may not have been published yet, so downloading it will
fail. Ideally, we could resolve this sort of issue by configuring the
phd-run job for PRs to depend on the phd-build job for master, so
that the branch's test run isn't started until any commits that just
merged to master have published artifacts. However, this isn't
basely possible in Buildomat (see oxidecomputer/buildomat#46). As a
temporary workaround, I've added code to the PHD artifact store to retry
downloading Buildomat artifacts with an exponential backoff, for up to a
configurable duration (defaulting to 20 minutes). This allows us to wait
for an in-progress build to complete, with a limit on how long we'll
wait for.

Depends on #604

@hawkw hawkw requested a review from gjcolombo January 16, 2024 21:39
@hawkw hawkw added the testing Related to testing and/or the PHD test framework. label Jan 16, 2024
@hawkw
Copy link
Member Author

hawkw commented Jan 16, 2024

Hm, okay, there seems to be a bit of an issue where, when a PR has just merged to master and we get its hash from buildomat, the public/branch/{repo}/master endpoint will have been updated with the new commit hash before the artifacts for that hash have actually been built. If we then attempt to download the artifact for that commit when it hasn't been built yet, we get a 500 error from buildomat, which seems unfortunate.

@hawkw
Copy link
Member Author

hawkw commented Jan 17, 2024

Regarding #609 (comment), I spoke to @jclulow yesterday a bit about what to do when Buildomat has updated the branch's HEAD commit but hasn't actually finished building artifacts for that revision yet. He suggested that we could potentially add the ability to declare that a Buildomat job depends on a different job having uploaded a file (oxidecomputer/buildomat#46), which would let us say that the phd_run job for branches depends on the phd_build job for the master branch having actually published the propolis artifact. That way, we don't start running the tests until any currently in flight build for the master branch artifact is finished, so we don't fail tests due to that artifact being unavailable.

I'd like to potentially look into implementing that in Buildomat. In the meantime, though, I was thinking about changing PHD's artifact store to use a larger number of retries with an exponential backoff duration when downloading buildomat artifacts, so that we end up waiting longer for a potentially-in-flight build to finish. @gjcolombo, what do you think about that as a way to unblock this change before we've made the upstream changes to buildomat?

Base automatically changed from eliza/phd-crucible to master January 17, 2024 22:59
@hawkw hawkw force-pushed the eliza/phd-propolis branch from 2e4d4bc to 5e3272d Compare January 17, 2024 23:01
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 257 files.

Valid Invalid Ignored Fixed
192 1 64 0
Click to see the invalid file list
  • phd-tests/framework/src/artifacts/buildomat.rs

@hawkw hawkw marked this pull request as ready for review January 18, 2024 21:54
@hawkw hawkw changed the title phd: fetch the latest master Propolis, add migration from master test phd: add basic "migration-from-current" test + machinery Jan 18, 2024
Copy link
Contributor

@gjcolombo gjcolombo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pulling this together!

@@ -105,3 +99,50 @@ fn multiple_migrations(ctx: &Framework) {
"I have migrated!"
);
}

fn run_smoke_test(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what this would look like in the context of the lifecycle framework in framework/src/lifecycle.rs (I don't think I pointed you at this previously--mea culpa for that). That module was meant to implement something similar to what you have here: start the VM, interact with it, then stop/start or migrate it and ensure that invariants (in this case the existence of foo.bar) are upheld across each of those transitions.

(I think if we go this route for this test, we should move the serial console history checks into their own test case.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, neat, I hadn't seen that module --- will look!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I rewrote these to use the lifecycle framework, and made a separate test for serial history. I've also added a lifecycle test for multiple migrations between the "current" and "default" Propolis artifacts. LMKWYT?

phd-tests/framework/src/artifacts/download.rs Outdated Show resolved Hide resolved
phd-tests/framework/src/artifacts/buildomat.rs Outdated Show resolved Hide resolved
phd-tests/framework/src/artifacts/mod.rs Show resolved Hide resolved
@hawkw
Copy link
Member Author

hawkw commented Jan 19, 2024

Btw, @gjcolombo, a non-blocking question: how do you feel about the terminology I've used in this PR? In the artifact store name and in the CLI, we currently refer to the Propolis artifact that we use to represent the HEAD revision as "current Propolis". I chose that name over "HEAD Propolis" because there are CLI options to use an arbitrary Buildomat commit, or a local file, as that artifact, and in that case, they may not actually be from the HEAD commit of a branch. But, "current" feels a bit ambiguous --- it feels like the revision under test could also be called the "current" Propolis, and, indeed, while demoing this change, I think I referred to the revision under test as "current" a few times...if you have any preferences, I'd love to hear them. :)

@hawkw hawkw requested a review from gjcolombo January 19, 2024 23:13
@gjcolombo
Copy link
Contributor

In the long run I'd like to see us have at least three well-known Propolis artifacts:

  • the Propolis binary that's under test
  • the most recent Propolis binary from Buildomat, based on Buildomat's definition of "most recent" for master
  • the Propolis binary that was in Omicron's package manifest at a given Omicron commit (not necessarily from main; I think this is more useful if it's a tagged release commit, though this will require a little more regular maintenance than the just-ask-Buildomat case)

The idea would be to verify not only that incremental build-over-build upgrades work, but that upgrading from the most recent customer release works, since that's the migration we'll actually have to do to upgrade a VM on a customer rack. So on that view I think the artifact labels I might use would be more like __PROPOLIS_UNDER_TEST, __PROPOLIS_BUILDOMAT_MASTER, and __PROPOLIS_OMICRON_PACKAGE (with constant names tweaked roughly to match). WDYT?

@hawkw
Copy link
Member Author

hawkw commented Jan 19, 2024

The idea would be to verify not only that incremental build-over-build upgrades work, but that upgrading from the most recent customer release works, since that's the migration we'll actually have to do to upgrade a VM on a customer rack. So on that view I think the artifact labels I might use would be more like __PROPOLIS_UNDER_TEST, __PROPOLIS_BUILDOMAT_MASTER, and __PROPOLIS_OMICRON_PACKAGE (with constant names tweaked roughly to match). WDYT?

Overall, this naming scheme seems pretty good to me --- we can call it __PROPOLIS_BUILDOMAT_MASTER for now. However, we currently support multiple CLI args for setting where this artifact actually comes from; if I pass --current-propolis-cmd, it's no longer from "master" or from "buildomat", and if I pass --current-propolis-commit or --current-propolis-branch foobar, it's coming from Buildomat but it's not necessarily coming from "master".

We could just remove those CLI options, though. I only added them because we agreed that downloading the artifact at all should be opt-in with a CLI argument, and I figured that, while I was doing that, we ought to make it maximally configurable. But, we could simplify the CLI args to just --with-master-propolis or something that's just a boolean opt-in to downloading the master artifact at all..

Copy link
Contributor

@gjcolombo gjcolombo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (pending whatever adjustments you make to the artifact names per our discussion in the comments). Thanks again for picking this up!

@gjcolombo
Copy link
Contributor

But, we could simplify the CLI args to just --with-master-propolis or something that's just a boolean opt-in to downloading the master artifact at all..

Maybe --add-buildomat-propolis?

Copy link
Contributor

@gjcolombo gjcolombo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(LGTM'ing again for the new test case arrangement.)

@hawkw hawkw changed the title phd: add basic "migration-from-current" test + machinery phd: add basic "migration-from-base" tests + machinery Jan 19, 2024
@hawkw hawkw merged commit a13ebb9 into master Jan 21, 2024
10 checks passed
@hawkw hawkw deleted the eliza/phd-propolis branch January 21, 2024 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Related to testing and/or the PHD test framework.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants