Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: nv22: lightweight patch to upgrade calibnet #11776

Merged
merged 1 commit into from
Mar 27, 2024

Conversation

arajasek
Copy link
Contributor

@arajasek arajasek commented Mar 22, 2024

TODO

  • Add the new actors bundle to the v13 tar.zst, renaming the buggy bundle to have the calibrationnet-13-rc3 suffix
  • Add the new verifreg CIDs
  • Resolve outstanding TODOs about whether any other CIDs change
  • Test (run a dry-run on calibnet)
  • Edit this PR to NOT go into the releases branch directly (I assume we want it to go into a newly created release/v1.26.1 branch, but I'll defer to the team on that)

Related Issues

The calibration test network upgraded to nv22 / actors v13 2 weeks ago. In the time since then, new product requirements have emerged for the upcoming FIP-0083 (see proposals here and here). The current plan is to ship these proposed changes to mainnet AND retroactively apply them to calibnet (which is already running a version of actors v13).

Unfortunately, this is consensus-critical, and so needs a coordinated upgrade on the testnet in order to land the new code. This PR proposes a relatively easy way to achieve this.

Proposed Changes

  • Create a new UpgradeCalibrationDragonFixHeight upgrade that ONLY exists for calibrationnet (disabled for all others)
  • At that height, upgrade the verifreg actor to the new verifreg actor CID (and any other actors that need to change)
  • At that height, upgrade the system actor's manifest to refer to the "new" manifest data
  • Include the patched actors bundle as the v13 bundle for calibrationnet
  • Also, explicitly add the "previous" actors bundle to v13.tar.zst
  • Also, explicitly add the "pervious" verifreg actor info to actorMeta
  • Tweak the "real" v13 migration (that already ran) to upgrade to the "previous" bundle (this is only relevant for syncing calibnet from scratch).

Additional Info

  • Needs to be tested, we'll do that by:
    • simulating this on calibnet
  • Could also consider:
    • Doing a "full" upgrade on calibnet with new network / actors version, etc.
    • Leaving calibnet as-is for now, it'll get patched in the next upgrade.

Checklist

Before you mark the PR ready for review, please make sure that:

  • Commits have a clear commit message.
  • PR title is in the form of of <PR type>: <area>: <change being made>
    • example: fix: mempool: Introduce a cache for valid signatures
    • PR type: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, test
    • area, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps
  • If the PR affects users (e.g., new feature, bug fix, system requirements change), update the CHANGELOG.md and add details to the UNRELEASED section.
  • New features have usage guidelines and / or documentation updates in
  • Tests exist for new functionality or change in behavior
  • CI is green

@arajasek arajasek requested a review from a team as a code owner March 22, 2024 15:51
@Stebalien
Copy link
Member

If we plan to ship this to mainnet, isn't the 1.26 release that we just cut going to cause a problem?

@rvagg
Copy link
Member

rvagg commented Mar 25, 2024

Leaving calibnet as-is for now, it'll get patched in the next upgrade.

Can we not do this? I'd really like to be able to test some events integration work on calibnet and not having the full event versions on there makes that hard.

I gather that the main task here is to get consensus on an upgrade height for calibnet from its major users and then ship a 1.26.1 with a patch for it? What's the smallest window of time that we can reasonably do this without causing major pain for general calibnet users?

@arajasek
Copy link
Contributor Author

If we plan to ship this to mainnet, isn't the 1.26 release that we just cut going to cause a problem?

@Stebalien I'm not sure I understand the question / the problem. 1.26.0, which we just cut, will sync mainnet and syncs calibnet as it is today. If we do this (that is, if we have a "lightweight" calibnet patch to pull in the latest actor event changes), we'll land this PR and cut a 1.26.1, which must be used to sync calibnet for the lightweight upgrade. Both 1.26.0 and 1.26.1 will sync mainnet well.

@arajasek
Copy link
Contributor Author

Can we not do this? I'd really like to be able to test some events integration work on calibnet and not having the full event versions on there makes that hard.

So I'll be honest, I don't think we should do this, because it is a good amount of overhead and community management, plus some janky legacy code (this PR's changes, basically) that will live in Lotus for a while. I would like to say that we should be able to test the event integration work through itests & other test networks (butterflynet), but I'm not close enough to the work to know whether that's feasible.

But, as far as I understood it, the decision that was made last week in standup was to ship this patch, and the dev work for it is done (this PR). I'm not advocating to reevaluate that decision (we made the call, let's do it), I just wanted to include it as a "we could also consider X" for PR completeness.

I gather that the main task here is to get consensus on an upgrade height for calibnet from its major users and then ship a 1.26.1 with a patch for it? What's the smallest window of time that we can reasonably do this without causing major pain for general calibnet users?

Yes, that's correct. In theory, we try to allow at least one week between a "release" for calibnet and an upgrade, but we've definitely violated that (a lot) in the past, especially with the monkey-patching of nv21 last November. Realistically, 48-72 hours is probably fine. However, this isn't a Lotus decision, we'd need to discuss this in #fil-implementers.

@arajasek arajasek force-pushed the asr/calibnet-dragon-patch-1.26 branch from c7c06bb to 9e0bb17 Compare March 25, 2024 13:37
@Stebalien
Copy link
Member

@Stebalien I'm not sure I understand the question / the problem. 1.26.0, which we just cut, will sync mainnet and syncs calibnet as it is today. If we do this (that is, if we have a "lightweight" calibnet patch to pull in the latest actor event changes), we'll land this PR and cut a 1.26.1, which must be used to sync calibnet for the lightweight upgrade. Both 1.26.0 and 1.26.1 will sync mainnet well.

Ah, I thought we were making the same change on mainnet.

So I'll be honest, I don't think we should do this, because it is a good amount of overhead and community management, plus some janky legacy code (this PR's changes, basically) that will live in Lotus for a while. I would like to say that we should be able to test the event integration work through itests & other test networks (butterflynet), but I'm not close enough to the work to know whether that's feasible.

I guess, what did we actually get from this change? What are we actually trying to test?

@rvagg
Copy link
Member

rvagg commented Mar 26, 2024

What are we actually trying to test?

  1. that we don't get any surprise consensus headaches with the new verifreg v13 changes before they go live on mainnet.
  2. being able to consume and start to build tooling around what the verifreg events will actually look like when they are live on mainnet - we already have integrators looking at calibnet events to prepare for the upgrade.

It's not the end of the world though, as long as we're OK with going live with code that's different from what's been tested in a secondary live network. We tried to keep the diff small to help with confidence in the changes, but it's not nothing: filecoin-project/builtin-actors@v13.0.0-rc.3...v13.0.0

@rjan90
Copy link
Contributor

rjan90 commented Mar 27, 2024

Reached consensus with Forest and Venus on the patch time for Calibnet, and the switch epoch we landed on was 1493854, which should correspond to April 3rd - 11:00:00UTC.

@@ -194,7 +195,8 @@ func readEmbeddedBuiltinActorsMetadata(bundle string) ([]*BuiltinActorsMetadata,
// The following manifest cids existed temporarily on the calibnet testnet
// We include them in our builtin bundle, but intentionally omit from metadata
if root == cid.MustParse("bafy2bzacedrunxfqta5skb7q7x32lnp4efz2oq7fn226ffm7fu5iqs62jkmvs") ||
root == cid.MustParse("bafy2bzacebl4w5ptfvuw6746w7ev562idkbf5ppq72e6zub22435ws2rukzru") {
root == cid.MustParse("bafy2bzacebl4w5ptfvuw6746w7ev562idkbf5ppq72e6zub22435ws2rukzru") ||
root == cid.MustParse("bafy2bzacea4firkyvt2zzdwqjrws5pyeluaesh6uaid246tommayr4337xpmi") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arajasek Why either or ? Should we not just migrate to the "right" one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bafy2bzacedrunxfqta5skb7q7x32lnp4efz2oq7fn226ffm7fu5iqs62jkmvs and bafy2bzacebl4w5ptfvuw6746w7ev562idkbf5ppq72e6zub22435ws2rukzru are the "incorrect" manifests from nv21 (Watermelon). bafy2bzacea4firkyvt2zzdwqjrws5pyeluaesh6uaid246tommayr4337xpmi is the incorrect one currently running on calibnet.

These manifests are part of our builtin bundle (since they need to be present for the portions of the chain history when they were executed), but we don't want their metadata generated, so we skip them here. As a result only the "right" one is present in metadata.

}, {
Height: build.UpgradeCalibrationDragonFixHeight,
Network: network.Version22,
Migration: upgradeActorsV13VerifregFix(calibnetv13BuggyVerifregCID1, calibnetv13CorrectManifestCID1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be from calibnetv13BuggyManifestCID1 to calibnetv13CorrectManifestCID1 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the params here aren't "buggy" and "correct" manifests, they're the buggy ACTOR cid (used for sanity-checking) and the correct manifest CID. See usage in function belowe.

@arajasek arajasek changed the base branch from releases to release/v1.26.1 March 27, 2024 15:57
@arajasek arajasek force-pushed the asr/calibnet-dragon-patch-1.26 branch from 9e0bb17 to 3d303f7 Compare March 27, 2024 15:58
@arajasek arajasek force-pushed the asr/calibnet-dragon-patch-1.26 branch from 3d303f7 to 0b3df75 Compare March 27, 2024 17:38
@arajasek arajasek force-pushed the asr/calibnet-dragon-patch-1.26 branch from 0b3df75 to 03bb6a9 Compare March 27, 2024 17:41
@arajasek arajasek merged commit 4d0d520 into release/v1.26.1 Mar 27, 2024
91 of 94 checks passed
@arajasek arajasek deleted the asr/calibnet-dragon-patch-1.26 branch March 27, 2024 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ☑️ Done (Archive)
Development

Successfully merging this pull request may close these issues.

5 participants