Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-0118: Partitioned Epoch Rewards, amend/extend design #118

Merged
merged 23 commits into from
Mar 21, 2024

Conversation

CriesofCarrots
Copy link
Contributor

This SIMD supersedes SIMD-0015 with some new design elements.
I have begun by copying in the original SIMD so that the changes can be seen more easily in subsequent commits.

#116 may be useful as a reference, as it describes the existing Labs implementation.

@CriesofCarrots
Copy link
Contributor Author

@t-nelson (can't request as reviewer)

Copy link
Contributor

@jstarry jstarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good! Thanks for writing this up and consolidating everything.

proposals/0118-partitioned-epoch-reward-distribution.md Outdated Show resolved Hide resolved
proposals/0118-partitioned-epoch-reward-distribution.md Outdated Show resolved Hide resolved
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple clarifying suggestions. the changes lgtm otherwise. thanks for committing the copy of the original simd with the proposed modifications, it made review very pleasant!

i did notice some other, non-technical changes that might make the document more clear, but probably couldn't justify their own simd. do we want to entertain those here?

Copy link
Contributor Author

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did notice some other, non-technical changes that might make the document more clear, but probably couldn't justify their own simd. do we want to entertain those here?

I say yes. Let's get this as complete and useful as possible. I will keep things separated by commit.

Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i left quite a bit of eye twitch intact for the sake of brevity. these suggestions knock out most of the stumbling i came across that actually brought confusion

@HaoranYi
Copy link
Contributor

Handling the new field is much easier than recompute. And it is a one time cost. After it is implemented, you don't need to worry about it in future. However, adding another code path introduce maintaining cost in future too.

@t-nelson
Copy link
Contributor

the snapshot shortcut is only "easier" in the short term. it is punting the problem down the road to snapshot encoding, (de)ser, storage and distribution. success requires long term thinking, not taking the easy way out.

@HaoranYi
Copy link
Contributor

Regarding the original partitioned reward design to have the sysvar account carries the pending reward balance.

I think there was a concern raised by Anatoly that the total capital did not stay stable during the epoch.

Anatoly has an idea of merkling all the rewards at epoch boundary and depositing the rewards into a reserve account at epoch boundary. Then stake account, when withdrawing the rewards from the reserve account, will submit a merkle proof to verify the reward payment, then transfer the balance from the reserve account.

Later on, we simplified it a bit and removed merkle tree, but we still keep the reward balance in the sysvar so that the total capital is stable during the epoch to address the above concern.

@CriesofCarrots
Copy link
Contributor Author

I think there was a concern raised by Anatoly that the total capital did not stay stable during the epoch.

Total capital is not stable during an epoch now, as run_incinerator is triggered on every Bank freeze and decreases the capitalization by the amount in that account.

@HaoranYi
Copy link
Contributor

How much balance does incinerator account get per epoch? And where does it get the blance?

@CriesofCarrots
Copy link
Contributor Author

How much balance does incinerator account get per epoch? And where does it get the blance?

I don't know how much it receives; the runtime isn't logging that data. I suppose we could write a geyser plugin that would report that (or an indexer might know already). Anyone can transfer lamports to the incinerator to burn them.

@t-nelson
Copy link
Contributor

i'm not sure how total capitalization nor incinerator are relevant here? we just need to be able to recover remaining partition distributions from minimal state when loading snapshots. if we have total&distributed epoch reward lamports and total&distributed epoch credits, Delegations are in accounts (or snapshotted stakes cache? 🫠). what else do we need do this practically?

@HaoranYi
Copy link
Contributor

One thing people may start noticing is that the Sol Supply on solana explorer will starting slowly increase at the epoch boundary.

image

It is more noticeable than a onetime increase at the epoch boundary. Probably, people would want an explanation for that. A slow and gradual increase in the total supply of sol per block may make people worry about inflation...

@HaoranYi
Copy link
Contributor

Total capital is not stable during an epoch now, as run_incinerator is triggered on every Bank freeze and decreases the capitalization by the amount in that account.

The capital change due to incinerator is very minimal. It doesn't make material impact on the total sol. While reward change are much more noticeable than incinerator burning.

@HaoranYi
Copy link
Contributor

HaoranYi commented Feb 26, 2024

i'm not sure how total capitalization nor incinerator are relevant here? we just need to be able to recover remaining partition distributions from minimal state when loading snapshots. if we have total&distributed epoch reward lamports and total&distributed epoch credits, Delegations are in accounts (or snapshotted stakes cache? 🫠). what else do we need do this practically?

Because, that's related to one of the main changes for this SIMD. If I read the SIMD correctly, there are two major chagnes:

  1. sysvar no longer carries the total reward balance.
  2. recompute rewards at restart.

@CriesofCarrots
Copy link
Contributor Author

The capital change due to incinerator is very minimal. It doesn't make material impact on the total sol. While reward change are much more noticeable than incinerator burning.

Because that is user-dependent, we absolutely cannot rely on that assumption. However, it sounds like epoch-capitalization stability is not a technical concern in the first place; just a comms issue. Since there will need to be communication about things like stake withdrawals being unavailable during the rewards period anyway, we can also explain the new shape of supply increases.

t-nelson
t-nelson previously approved these changes Feb 27, 2024
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ one nit.

proposals/0118-partitioned-epoch-reward-distribution.md Outdated Show resolved Hide resolved
Copy link
Contributor

@riptl riptl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I have two nits. (Sorry am on vacation so can't use my work account @ripatel-fd)

Copy link
Contributor

@godmodegalactus godmodegalactus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks a very good direction

Comment on lines +21 to +26
The distribution of epoch rewards at the start block of an epoch becomes a
significant bottleneck due to the rising number of stake accounts and voting
nodes on the network.

To address this bottleneck, we propose a new approach for distributing the
epoch rewards over multiple blocks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - have we measured that it is the distribution (write-back) versus the calculation that is the bottleneck?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely, although it doesn't seem like we have that data summarized concisely in any one place. It seems to be mostly in various places in the #proj-epoch-boundary-optimization channel on discord. For instance, here's a comment about the calculation time: https://discord.com/channels/428295358100013066/960593861732884520/1096146390037561414 (64ms for 550K stake accounts)
Whereas distribution is more like 5-10s.

Comment on lines +195 to +207
When booting from a snapshot, a node must check the EpochRewards sysvar account
to determine whether the distribution phase is active. If so, the node must
rerun the rewards partitioning using the `EpochRewards::num_partitions` and
`EpochRewards::parent_blockhash` sysvar fields and determining the upcoming
partitions by comparing its current block height to
`EpochRewards::distribution_starting_block_height`. Then the runtime must
recalculate the remaining rewards using the `EpochRewards::total_points` and
`EpochRewards::total_rewards` sysvar fields, as well as the `EpochStakes` in the
snapshot. The recalculated rewards can be confirmed by comparing a sum of the
rewards remaining (those partitions expected to not yet have been distributed)
with the difference between the `EpochRewards::total_rewards` and
`EpochRewards::distributed_rewards` fields. Partitions for blocks prior to the
current block height can be discarded.
Copy link

@topointon-jump topointon-jump Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rewards calculation assumes that the calculation is done at the epoch boundary (for example, using VoteState::epoch_credits). We need to make sure that re-calculating rewards when booting off a snapshot doesn't break the calculation, as this assumption is no longer true. I'm a bit worried that calculating rewards not at an epoch boundary will produce a different result.

I might be wrong but just wanted to flag as something to watch for when implementing this 😄

Copy link

@topointon-jump topointon-jump Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, I think solana_stake_program::stake_state::calculate_stake_points_and_credits might end up being tweaked slightly, as well as anywhere which uses VoteState::epoch_credits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is certainly the most sensitive aspect of the implementation. However, are you asking for any SIMD changes here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not in the SIMD, as I don't think this will be fully known until it's implemented. Just wanted to flag 😄

lheeger-jump
lheeger-jump previously approved these changes Mar 19, 2024
Copy link
Contributor

@lheeger-jump lheeger-jump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

t-nelson
t-nelson previously approved these changes Mar 20, 2024
proposals/0118-partitioned-epoch-reward-distribution.md Outdated Show resolved Hide resolved
@CriesofCarrots CriesofCarrots dismissed stale reviews from t-nelson and lheeger-jump via cdda8cd March 20, 2024 23:30
Co-authored-by: Trent Nelson <[email protected]>
Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@jacobcreech jacobcreech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like consensus has been reached between the Firedancer and Anza teams. Thank you @CriesofCarrots for championing this!

@jacobcreech jacobcreech merged commit 2b1640f into solana-foundation:main Mar 21, 2024
2 checks passed
@billythedummy
Copy link

To clarify, with this change, would the various stake pool programs (spl, marinade) need to upgrade to have their epoch update crank instruction fail if EpochRewards.active is true?

@CriesofCarrots
Copy link
Contributor Author

To clarify, with this change, would the various stake pool programs (spl, marinade) need to upgrade to have their epoch update crank instruction fail if EpochRewards.active is true?

@billythedummy , I'm not personally familiar with the various stake-pool programs or the instruction you reference, but if they have operations that depend on rewards distribution being complete, then the answer is yes.
With this change, the Stake Program will be updated so that only credit-only stake-account operations will succeed when EpochRewards.active, so stake-pool operations that depend on mutating stake accounts will fail during the rewards period.

@billythedummy
Copy link

@CriesofCarrots each stake pool program basically have crank instructions that on every epoch reads the new balances of the stake accounts it owns, which are supposed to have increased at the epoch boundary due to staking rewards, and updates the exchange rate between SOL and the pool’s LST. These instructions dont mutate stake accounts in all cases, so i think we would need the EpochRewards.active check on the code paths that only read the balances of the stake accounts.

Tagging stake pool programs maintainers here for visibility @joncinque @ochaloup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.