-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phd: add basic "migration-from-base" tests + machinery #609
Conversation
Hm, okay, there seems to be a bit of an issue where, when a PR has just merged to |
f055035
to
0225f51
Compare
Regarding #609 (comment), I spoke to @jclulow yesterday a bit about what to do when Buildomat has updated the branch's HEAD commit but hasn't actually finished building artifacts for that revision yet. He suggested that we could potentially add the ability to declare that a Buildomat job depends on a different job having uploaded a file (oxidecomputer/buildomat#46), which would let us say that the I'd like to potentially look into implementing that in Buildomat. In the meantime, though, I was thinking about changing PHD's artifact store to use a larger number of retries with an exponential backoff duration when downloading buildomat artifacts, so that we end up waiting longer for a potentially-in-flight build to finish. @gjcolombo, what do you think about that as a way to unblock this change before we've made the upstream changes to buildomat? |
2e4d4bc
to
5e3272d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has totally checked 257 files.
Valid | Invalid | Ignored | Fixed |
---|---|---|---|
192 | 1 | 64 | 0 |
Click to see the invalid file list
- phd-tests/framework/src/artifacts/buildomat.rs
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
master
Propolis, add migration from master testThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pulling this together!
phd-tests/tests/src/migrate.rs
Outdated
@@ -105,3 +99,50 @@ fn multiple_migrations(ctx: &Framework) { | |||
"I have migrated!" | |||
); | |||
} | |||
|
|||
fn run_smoke_test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what this would look like in the context of the lifecycle framework in framework/src/lifecycle.rs
(I don't think I pointed you at this previously--mea culpa for that). That module was meant to implement something similar to what you have here: start the VM, interact with it, then stop/start or migrate it and ensure that invariants (in this case the existence of foo.bar
) are upheld across each of those transitions.
(I think if we go this route for this test, we should move the serial console history checks into their own test case.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, neat, I hadn't seen that module --- will look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I rewrote these to use the lifecycle framework, and made a separate test for serial history. I've also added a lifecycle test for multiple migrations between the "current" and "default" Propolis artifacts. LMKWYT?
Btw, @gjcolombo, a non-blocking question: how do you feel about the terminology I've used in this PR? In the artifact store name and in the CLI, we currently refer to the Propolis artifact that we use to represent the HEAD revision as "current Propolis". I chose that name over "HEAD Propolis" because there are CLI options to use an arbitrary Buildomat commit, or a local file, as that artifact, and in that case, they may not actually be from the HEAD commit of a branch. But, "current" feels a bit ambiguous --- it feels like the revision under test could also be called the "current" Propolis, and, indeed, while demoing this change, I think I referred to the revision under test as "current" a few times...if you have any preferences, I'd love to hear them. :) |
In the long run I'd like to see us have at least three well-known Propolis artifacts:
The idea would be to verify not only that incremental build-over-build upgrades work, but that upgrading from the most recent customer release works, since that's the migration we'll actually have to do to upgrade a VM on a customer rack. So on that view I think the artifact labels I might use would be more like |
Overall, this naming scheme seems pretty good to me --- we can call it We could just remove those CLI options, though. I only added them because we agreed that downloading the artifact at all should be opt-in with a CLI argument, and I figured that, while I was doing that, we ought to make it maximally configurable. But, we could simplify the CLI args to just |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (pending whatever adjustments you make to the artifact names per our discussion in the comments). Thanks again for picking this up!
Maybe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(LGTM'ing again for the new test case arrangement.)
In order to ensure that changes to
propolis
don't break instancemigration from previous versions, we would like to add automated testing
of migrating an instance from the current
master
branch ofpropolis
to run on PR branches. This PR adds an implementation of such a test to
phd
.To implement this, I've built on top of my change from PR #604 and
modified the
phd
artifact store to introduce a notion of a "base"Propolis server artifact. This artifact can then be used to test
migration from the "base" Propolis version to the revision under
test. I've added a new test case in
migrate.rs
that creates a sourceVM using the "base" Propolis artifact and attempts to migrate that
instance to a target VM running on the "default" Propolis artifact (the
revision being tested). In order to add the new test, I've factored out
test code from the existing
migrate::smoke_test
test.How
phd
should acquire a "base" Propolis artifact is configured byseveral new command-line arguments.
--base-propolis-branch
takesthe name of a Git branch on the
propolis
repo. If this argument isprovided, PHD will download the Propolis debug artifact from the HEAD
commit of that branch from Buildomat. Alternatively, the
--base-propolis-commit
argument accepts a Git commit hash todownload from Buildomat. Finally, the
--base-propolis-cmd
argumenttakes a local path to a binary to use as the "base" Propolis. All
these arguments are mutually exclusive, and if none of them are
provided, the migration-from-base tests are skipped.
When the "base" Propolis artifact is configured from a Git branch
name (i.e. the
--base-propolis-branch
CLI argument is passed), weuse the Buildomat
/public/branch/{repo}/{branch-name}
endpoint, whichreturns the Git hash of the HEAD commit to that branch. Then, we attempt
to download an artifact from Buildomat for that commit hash. An issue
here is that Buildomat's branch endpoint will return the latest commit
hash for that branch as soon as it sees a commit, but the artifact for
that commit may not have been published yet, so downloading it will
fail. Ideally, we could resolve this sort of issue by configuring the
phd-run
job for PRs to depend on thephd-build
job formaster
, sothat the branch's test run isn't started until any commits that just
merged to
master
have published artifacts. However, this isn'tbasely possible in Buildomat (see oxidecomputer/buildomat#46). As a
temporary workaround, I've added code to the PHD artifact store to retry
downloading Buildomat artifacts with an exponential backoff, for up to a
configurable duration (defaulting to 20 minutes). This allows us to wait
for an in-progress build to complete, with a limit on how long we'll
wait for.
Depends on #604