phd: add basic "migration-from-base" tests + machinery #609

hawkw · 2024-01-16T21:39:07Z

In order to ensure that changes to propolis don't break instance
migration from previous versions, we would like to add automated testing
of migrating an instance from the current master branch of propolis
to run on PR branches. This PR adds an implementation of such a test to
phd.

To implement this, I've built on top of my change from PR #604 and
modified the phd artifact store to introduce a notion of a "base"
Propolis server artifact. This artifact can then be used to test
migration from the "base" Propolis version to the revision under
test. I've added a new test case in migrate.rs that creates a source
VM using the "base" Propolis artifact and attempts to migrate that
instance to a target VM running on the "default" Propolis artifact (the
revision being tested). In order to add the new test, I've factored out
test code from the existing migrate::smoke_test test.

How phd should acquire a "base" Propolis artifact is configured by
several new command-line arguments. --base-propolis-branch takes
the name of a Git branch on the propolis repo. If this argument is
provided, PHD will download the Propolis debug artifact from the HEAD
commit of that branch from Buildomat. Alternatively, the
--base-propolis-commit argument accepts a Git commit hash to
download from Buildomat. Finally, the --base-propolis-cmd argument
takes a local path to a binary to use as the "base" Propolis. All
these arguments are mutually exclusive, and if none of them are
provided, the migration-from-base tests are skipped.

When the "base" Propolis artifact is configured from a Git branch
name (i.e. the --base-propolis-branch CLI argument is passed), we
use the Buildomat /public/branch/{repo}/{branch-name} endpoint, which
returns the Git hash of the HEAD commit to that branch. Then, we attempt
to download an artifact from Buildomat for that commit hash. An issue
here is that Buildomat's branch endpoint will return the latest commit
hash for that branch as soon as it sees a commit, but the artifact for
that commit may not have been published yet, so downloading it will
fail. Ideally, we could resolve this sort of issue by configuring the
phd-run job for PRs to depend on the phd-build job for master, so
that the branch's test run isn't started until any commits that just
merged to master have published artifacts. However, this isn't
basely possible in Buildomat (see oxidecomputer/buildomat#46). As a
temporary workaround, I've added code to the PHD artifact store to retry
downloading Buildomat artifacts with an exponential backoff, for up to a
configurable duration (defaulting to 20 minutes). This allows us to wait
for an in-progress build to complete, with a limit on how long we'll
wait for.

Depends on #604

hawkw · 2024-01-16T23:51:01Z

Hm, okay, there seems to be a bit of an issue where, when a PR has just merged to master and we get its hash from buildomat, the public/branch/{repo}/master endpoint will have been updated with the new commit hash before the artifacts for that hash have actually been built. If we then attempt to download the artifact for that commit when it hasn't been built yet, we get a 500 error from buildomat, which seems unfortunate.

hawkw · 2024-01-17T21:43:40Z

Regarding #609 (comment), I spoke to @jclulow yesterday a bit about what to do when Buildomat has updated the branch's HEAD commit but hasn't actually finished building artifacts for that revision yet. He suggested that we could potentially add the ability to declare that a Buildomat job depends on a different job having uploaded a file (oxidecomputer/buildomat#46), which would let us say that the phd_run job for branches depends on the phd_build job for the master branch having actually published the propolis artifact. That way, we don't start running the tests until any currently in flight build for the master branch artifact is finished, so we don't fail tests due to that artifact being unavailable.

I'd like to potentially look into implementing that in Buildomat. In the meantime, though, I was thinking about changing PHD's artifact store to use a larger number of retries with an exponential backoff duration when downloading buildomat artifacts, so that we end up waiting longer for a potentially-in-flight build to finish. @gjcolombo, what do you think about that as a way to unblock this change before we've made the upstream changes to buildomat?

github-actions

license-eye has totally checked 257 files.

Valid	Invalid	Ignored	Fixed
192	1	64	0

Click to see the invalid file list

phd-tests/framework/src/artifacts/buildomat.rs

phd-tests/framework/src/artifacts/buildomat.rs

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

gjcolombo

Thanks for pulling this together!

gjcolombo · 2024-01-19T18:32:38Z

phd-tests/tests/src/migrate.rs

@@ -105,3 +99,50 @@ fn multiple_migrations(ctx: &Framework) {
        "I have migrated!"
    );
 }
+
+fn run_smoke_test(


I'm curious what this would look like in the context of the lifecycle framework in framework/src/lifecycle.rs (I don't think I pointed you at this previously--mea culpa for that). That module was meant to implement something similar to what you have here: start the VM, interact with it, then stop/start or migrate it and ensure that invariants (in this case the existence of foo.bar) are upheld across each of those transitions.

(I think if we go this route for this test, we should move the serial console history checks into their own test case.)

Oh, neat, I hadn't seen that module --- will look!

Okay, I rewrote these to use the lifecycle framework, and made a separate test for serial history. I've also added a lifecycle test for multiple migrations between the "current" and "default" Propolis artifacts. LMKWYT?

phd-tests/framework/src/artifacts/download.rs

phd-tests/framework/src/artifacts/buildomat.rs

phd-tests/framework/src/artifacts/mod.rs

hawkw · 2024-01-19T21:54:40Z

Btw, @gjcolombo, a non-blocking question: how do you feel about the terminology I've used in this PR? In the artifact store name and in the CLI, we currently refer to the Propolis artifact that we use to represent the HEAD revision as "current Propolis". I chose that name over "HEAD Propolis" because there are CLI options to use an arbitrary Buildomat commit, or a local file, as that artifact, and in that case, they may not actually be from the HEAD commit of a branch. But, "current" feels a bit ambiguous --- it feels like the revision under test could also be called the "current" Propolis, and, indeed, while demoing this change, I think I referred to the revision under test as "current" a few times...if you have any preferences, I'd love to hear them. :)

gjcolombo · 2024-01-19T23:15:42Z

In the long run I'd like to see us have at least three well-known Propolis artifacts:

the Propolis binary that's under test
the most recent Propolis binary from Buildomat, based on Buildomat's definition of "most recent" for master
the Propolis binary that was in Omicron's package manifest at a given Omicron commit (not necessarily from main; I think this is more useful if it's a tagged release commit, though this will require a little more regular maintenance than the just-ask-Buildomat case)

The idea would be to verify not only that incremental build-over-build upgrades work, but that upgrading from the most recent customer release works, since that's the migration we'll actually have to do to upgrade a VM on a customer rack. So on that view I think the artifact labels I might use would be more like __PROPOLIS_UNDER_TEST, __PROPOLIS_BUILDOMAT_MASTER, and __PROPOLIS_OMICRON_PACKAGE (with constant names tweaked roughly to match). WDYT?

hawkw · 2024-01-19T23:25:06Z

The idea would be to verify not only that incremental build-over-build upgrades work, but that upgrading from the most recent customer release works, since that's the migration we'll actually have to do to upgrade a VM on a customer rack. So on that view I think the artifact labels I might use would be more like __PROPOLIS_UNDER_TEST, __PROPOLIS_BUILDOMAT_MASTER, and __PROPOLIS_OMICRON_PACKAGE (with constant names tweaked roughly to match). WDYT?

Overall, this naming scheme seems pretty good to me --- we can call it __PROPOLIS_BUILDOMAT_MASTER for now. However, we currently support multiple CLI args for setting where this artifact actually comes from; if I pass --current-propolis-cmd, it's no longer from "master" or from "buildomat", and if I pass --current-propolis-commit or --current-propolis-branch foobar, it's coming from Buildomat but it's not necessarily coming from "master".

We could just remove those CLI options, though. I only added them because we agreed that downloading the artifact at all should be opt-in with a CLI argument, and I figured that, while I was doing that, we ought to make it maximally configurable. But, we could simplify the CLI args to just --with-master-propolis or something that's just a boolean opt-in to downloading the master artifact at all..

gjcolombo

LGTM (pending whatever adjustments you make to the artifact names per our discussion in the comments). Thanks again for picking this up!

gjcolombo · 2024-01-19T23:26:12Z

But, we could simplify the CLI args to just --with-master-propolis or something that's just a boolean opt-in to downloading the master artifact at all..

Maybe --add-buildomat-propolis?

gjcolombo

(LGTM'ing again for the new test case arrangement.)

hawkw requested a review from gjcolombo January 16, 2024 21:39

hawkw added the testing Related to testing and/or the PHD test framework. label Jan 16, 2024

hawkw mentioned this pull request Jan 17, 2024

jobs could depend on the prior upload of specific published files oxidecomputer/buildomat#46

Open

hawkw force-pushed the eliza/phd-propolis branch from f055035 to 0225f51 Compare January 17, 2024 20:28

Base automatically changed from eliza/phd-crucible to master January 17, 2024 22:59

hawkw added 10 commits January 17, 2024 15:01

refactor buildomat stuff into a module

c7f55ee

phd: add migration from HEAD to current smoke test

4e663d9

whoops lol

260584f

that's what i get for not looking in the tarball

8978aeb

use phd_build series rather than image

a0877b3

post-rebase fixup

25d5747

don't download by default

1a1a069

whoops

7fa225a

blergh

3d3b7d3

add current branch to phd-run.sh

5e3272d

hawkw force-pushed the eliza/phd-propolis branch from 2e4d4bc to 5e3272d Compare January 17, 2024 23:01

github-actions bot reviewed Jan 17, 2024

View reviewed changes

phd-tests/framework/src/artifacts/buildomat.rs Show resolved Hide resolved

hawkw and others added 6 commits January 17, 2024 15:45

blergh license

c3a6f45

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

retry with backoff for failed Buildomat downloads

4668bfb

nicer-looking errors

cbc5f1f

also retry getting SHAs

74a6c4f

slightly nicer errors & logs

e0b02e5

fix wrong propolis artifact filename

8ca690e

hawkw marked this pull request as ready for review January 18, 2024 21:54

whoops that's not supposed to be there

619a83b

hawkw changed the title ~~phd: fetch the latest master Propolis, add migration from master test~~ phd: add basic "migration-from-current" test + machinery Jan 18, 2024

gjcolombo reviewed Jan 19, 2024

View reviewed changes

rm dead module

dfe69d8

hawkw added 2 commits January 19, 2024 11:43

add link to buildomat PR in comments

2907d2e

add comments explaining download behaviors

f21d7b0

hawkw added 2 commits January 19, 2024 14:52

use lifecycle tests

c594abd

fix wrongish tests & add more tests

d56a04b

hawkw requested a review from gjcolombo January 19, 2024 23:13

gjcolombo approved these changes Jan 19, 2024

View reviewed changes

s/current propolis/base propolis

734943b

hawkw changed the title ~~phd: add basic "migration-from-current" test + machinery~~ phd: add basic "migration-from-base" tests + machinery Jan 19, 2024

hawkw merged commit a13ebb9 into master Jan 21, 2024
10 checks passed

hawkw deleted the eliza/phd-propolis branch January 21, 2024 00:02

hawkw mentioned this pull request Jan 26, 2024

PHD: want support for migration between distinct Propolis binaries #528

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phd: add basic "migration-from-base" tests + machinery #609

phd: add basic "migration-from-base" tests + machinery #609

hawkw commented Jan 16, 2024 •

edited

Loading

hawkw commented Jan 16, 2024

hawkw commented Jan 17, 2024

github-actions bot left a comment

gjcolombo left a comment

gjcolombo Jan 19, 2024

hawkw Jan 19, 2024

hawkw Jan 19, 2024

hawkw commented Jan 19, 2024

gjcolombo commented Jan 19, 2024

hawkw commented Jan 19, 2024

gjcolombo left a comment

gjcolombo commented Jan 19, 2024

gjcolombo left a comment

phd: add basic "migration-from-base" tests + machinery #609

phd: add basic "migration-from-base" tests + machinery #609

Conversation

hawkw commented Jan 16, 2024 • edited Loading

hawkw commented Jan 16, 2024

hawkw commented Jan 17, 2024

github-actions bot left a comment

Choose a reason for hiding this comment

gjcolombo left a comment

Choose a reason for hiding this comment

gjcolombo Jan 19, 2024

Choose a reason for hiding this comment

hawkw Jan 19, 2024

Choose a reason for hiding this comment

hawkw Jan 19, 2024

Choose a reason for hiding this comment

hawkw commented Jan 19, 2024

gjcolombo commented Jan 19, 2024

hawkw commented Jan 19, 2024

gjcolombo left a comment

Choose a reason for hiding this comment

gjcolombo commented Jan 19, 2024

gjcolombo left a comment

Choose a reason for hiding this comment

hawkw commented Jan 16, 2024 •

edited

Loading