Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEPs on how to split states of shards for resharding #241

Merged
merged 14 commits into from
Sep 14, 2022
74 changes: 34 additions & 40 deletions specs/Proposals/0040-split-states.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ This in turn involves two parts, how the validators know when and how sharding c
The former is a protocol change and the latter only affects validators' internal states.

### Protocol Change
Sharding config for an epoch will be encapsulated in a struct `ShardLayout`, which not only contains number of shards, but also layout information to decide which account ids should be mapped to which shards.
Sharding config for an epoch will be encapsulated in a struct `ShardLayout`, which not only contains the number of shards, but also layout information to decide which account ids should be mapped to which shards.
The `ShardLayout` information will be stored as part of `EpochConfig`.
Right now, `EpochConfig` is stored in `EpochManager` and remains static across epochs.
That will be changed in the new implementation so that `EpochConfig` can be changed according to protocol versions, similar to how `RuntimeConfig` is implemented right now.
Expand Down Expand Up @@ -115,20 +115,21 @@ pub enum ShardLayout {
ShardLayout is a versioned struct that contains all information needed to decide which accounts belong to which shards. Note that `ShardLayout` only contains information at the protocol level, so it uses `ShardOrd` instead of `ShardId`.

The API contains the following two functions.
#### `parent_shards`
```rust
pub fn parent_shards(&self) -> Vec<ShardOrd>
#### `get_split_shards`
```
pub fn get_split_shards(&self, parent_shard_id: ShardId) -> Option<&Vec<ShardId>>
```
returns a vector of shards ords consisting of the shard ord of the parent shard of the current shard of position in this array. This information is needed for constructing states for the new shards.
returns the children shards of shard `parent_shard_id` (we will explain parent-children shards shortly). Note that `parent_shard_id` is a shard from the last ShardLayout, not from `self`. The returned `ShardId` represents shard in the current shard layout.
This information is needed for constructing states for the new shards.

We only allow adding new shards that are split from the existing shards. If shard B and C are split from shard A, we call shard A the parent shard of shard B and C.
For example, if epoch T-1 has two shards with `shard_ord` 0 and 1 and each of them will be split to two shards in epoch T, then the calling `parent_shards` on the shard layout of epoch T will return `[0, 0, 1, 1]`.
For example, if epoch T-1 has a shard layout `shardlayout0` with two shards with `shard_ord` 0 and 1 and each of them will be split to two shards in `shardlayout1` in epoch T, then `shard_layout1.get_split_shards(0)` returns `[0,1]` and `shard_layout.get_split_shards(1)` returns `[2,3]`.

#### `version`
```rust
pub fn version(&self) -> ShardVersion
```
returns the version number of this shard layout. This version number is also used to create `ShardUId` for shards in this `ShardLayout`.
returns the version number of this shard layout. This version number is used to create `ShardUId` for shards in this `ShardLayout`. The version numbers must be different for all shard layouts used in the blockchain.

#### `account_id_to_shard_id`
```rust
Expand All @@ -143,7 +144,7 @@ pub struct ShardLayoutV0 {
num_shards: NumShards,
}
```
A shard layout that maps accounts evenly across all shards. This is added to capture the current `account_id_to_shard_id` algorithm, to keep backward compatibility for some existing tests. `parent_shards` for `ShardLayoutV1` is always `None` and `version`is always `0`.
A shard layout that maps accounts evenly across all shards -- by calculate the hash of account id and mod number of shards. This is added to capture the current `account_id_to_shard_id` algorithm, to keep backward compatibility for some existing tests. `parent_shards` for `ShardLayoutV1` is always `None` and `version`is always `0`.

#### `ShardLayoutV1`
```rust
Expand All @@ -163,7 +164,7 @@ pub struct ShardLayoutV1 {
A shard layout that consists some fixed shards each of which is mapped to a fixed account and other shards which are mapped to ranges of accounts. This will be the ShardLayout used by Simple Nightshade.

### `EpochConfig`
`EpochConfig` will contain the shard layout info.
`EpochConfig` will contain the shard layout for the given epoch.

```rust
pub struct EpochConfig {
Expand All @@ -173,7 +174,7 @@ pub struct EpochConfig {
pub shard_layout: ShardLayout,
```
### `AllEpochConfig`
`AllEpochConfig` stores information needed to construct `EpochConfig` for all epochs. For SimpleNightshade migration, it only needs to contain two configs. `AllEpochConfig` will be stored in `EpochManager` to be used to construct `EpochConfig` for different epochs.
`AllEpochConfig` stores a mapping from protocol versions to `EpochConfig`s. `EpochConfig` for a particular epoch can be retrieved from `AllEpochConfig`, given the protocol version of the epoch. For SimpleNightshade migration, it only needs to contain two configs. `AllEpochConfig` will be stored inside `EpochManager` to be used to construct `EpochConfig` for different epochs.

```rust
pub struct AllEpochConfig {
Expand All @@ -187,6 +188,14 @@ pub fn for_protocol_version(&self, protocol_version: ProtocolVersion) -> &Arc<Ep
```
returns `EpochConfig` according to the given protocol version. `EpochManager` will call this function for every new epoch.

### `EpochManager`
`EpochManager` will be responsible for managing `ShardLayout` accross epochs. As we mentioned, `EpochManager` stores an instance of `AllEpochConfig`, so it can returns the `ShardLayout` for each epoch.

#### `get_shard_layout`
```rust
pub fn get_shard_layout(&mut self, epoch_id: &EpochId) -> Result<&ShardLayout, EpochError>
```

## Internal Shard Representation in Validators' State
### `ShardUId`
`ShardUId` is a unique identifier that a validator uses internally to identify shards from all epochs. It only exists inside a validator's internal state and can be different among validators, thus it should never be exposed to outside APIs.
Expand All @@ -197,8 +206,17 @@ pub struct ShardUId {
pub shard_id: u32,
}
```

`version` in `ShardUId` comes from the version of `ShardLayout` that this shard belongs. This way, different shards from different shard layout will have different `ShardUId`s.

### Database storage
The following database columns are stored with `ShardId` as part of the database key, it will be replaced by `ShardUId`
- ColState
- ColChunkExtra
- ColTrieChanges
Comment on lines +212 to +216
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


#### `TrieCachingStorage`
Trie storage will be contruct database key from `ShardUId` and hash of the trie node.
Trie storage will contruct database key from `ShardUId` and hash of the trie node.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Trie storage will contruct database key from `ShardUId` and hash of the trie node.
Trie storage will construct database key from `ShardUId` and hash of the trie node.

##### `get_shard_uid_and_hash_from_key`
```rust
fn get_shard_uid_and_hash_from_key(key: &[u8]) -> Result<(ShardUId, CryptoHash), std::io::Error>
Expand All @@ -209,36 +227,12 @@ fn get_key_from_shard_uid_and_hash(shard_uid: ShardUId, hash: &CryptoHash) -> [u
```


### `EpochManager`
`EpochManager` will be responsible for managing shard ids accross epochs. Information regarding shard ids in an epoch will be stored in a struct `ShardsInfo`, which will be part of `EpochInfo`. `EpochManager` assigns shard ids for shards in a new epoch when it builds `EpochInfo` for the epoch, in function `finalize_epoch`.

When constructing `EpochInfo` for a new epoch, `EpochManager` creates the `EpochConfig` for the protocol version of the epoch. Then it assigns shard ids for the shards in the new epoch according to `ShardLayout` in the `EpochConfig`.


### ShardTracker
Various functions such as `account_id_to_shard_id` in `ShardTracker` will be changed to incorporate the change in `ShardId`.
The `num_shards` field will be removed from `ShardTracker` since it is no longer a static number.
The current shard information can be accessed by the following functions.
These changes will also be propagated to wrapper functions in `RuntimeAdapter` since `ShardTracker` cannot be directly accessed through `RuntimeAdapter`.

#### `get_shards`
```rust
pub fn get_shards() -> Vec<ShardId>
```
returns the `shard_id`s of the shards in the current epoch.

#### `get_shards_next_epoch`
```rust
pub fn get_shards_next_epoch() -> (Vec<ShardId>, HashMap<ShardId, ShardId>)
```
returns the `shard_id`s of the shards in the next epoch and a map from those shards to their parent shards.

## Build New States
The following method in `Client` will be added or modified to split a shard's current state into multiple states.
The following method in `Chain` will be added or modified to split a shard's current state into multiple states.

### `split_shards`
### `build_state_for_split_shards`
```rust
pub fn split_shards(me: &Option<AccountId>, sync_hash: CryptoHash, shard_id: ShardId)
pub fn build_state_for_split_shards(&mut self, sync_hash: &CryptoHash, shard_id: ShardId) -> Result<(), Error>
```
builds states for the new shards that the shard `shard_id` will be split to.
After this function is finished, the states for the new shards should be ready in `ShardTries` to be accessed.
Expand Down Expand Up @@ -338,7 +332,7 @@ Althought we need to handle garbage collection eventually, it is not a pressing
[drawbacks]: #drawbacks

The drawback of this approach is that it will not work when challenges are enabled since challenges to the transition to the new states will be too large to construct or verify.
Thus, most of the change will likely be a one time use that only works for the Simple Nightshade transition, although part of the change involing `ShardId` may be reused in the future.
Thus, most of the change will likely be a one time use that only works for the Simple Nightshade transition, although part of the change involving `ShardId` may be reused in the future.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives
Expand All @@ -357,7 +351,7 @@ However, the implementaion of those approaches are overly complicated and does n
- Garbage collection
- State Sync?
- What parts of the design do you expect to resolve through the implementation of this feature before stabilization?
- There might be small changes in the detailed implemenations or specifications of some of the functions described above, but the overall structure will not be changed.
- There might be small changes in the detailed implementations or specifications of some of the functions described above, but the overall structure will not be changed.
- What related issues do you consider out of scope for this NEP that could be addressed in the future independently of the solution that comes out of this NEP?
- One issue that is related to this NEP but will be resolved indepdently is how trie nodes are stored in the database.
Right now, it is a combination of `shard_id` and the node hash.
Expand Down