-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEPs on how to split states of shards for resharding #241
Conversation
Your Render PR Server URL is https://nomicon-pr-241.onrender.com. Follow its progress at https://dashboard.render.com/static/srv-c3r0fns6fj3f3m2jjpi0. |
Most of the spec is done. Still need to fill some analysis sections like Rationales and alternatives, Unresolved questions, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One small thing: we prefer to have one sentence per line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, it is highly readable!
specs/Proposals/0040-split-states.md
Outdated
is not a reason to accept the current or a future NEP. Such notes should be | ||
in the section on motivation or rationale in this or subsequent NEPs. | ||
The section merely provides additional information. | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well we've discussed possibly switching to on-chain state updates once we introduce challenges...
Thank you for thorough NEP and following https://gov.near.org/t/quality-control-for-protocol-api-changes/1941#the-format-4 especially for talking about drawbacks and alternatives! Can we also add a list of affected projects (maybe @bowenwang1996 can help since he has a lot of context) and pre-mortem (imagine a most likely scenario of how this design change can lead to a failure). Thanks! |
…ersion. Addressed some comments but not all. Still need to finish the Future section
I am sorry for jumping late to the discussion, but reviewing near/nearcore#4554, I feel like from the end-user point of view, they should care about some "shard id" (I feel it does not matter if it is unique in the epoch or unique across the epochs). Here are the questions I suggest we ask ourselves: Why do you believe outside world should care about shard ord rather than unique shard id? If we want some unique shard id, we may just introduce ShardUniqueId (ShardUid) instead of going through this massive renaming effort. |
@frol thanks for the detailed comment! Regarding your questions
That's a great question. I think unique shard id is a concept that closely tied to how we choose to represent shard internally in the state. Thus it may subject to change in the future if the implementation changes. By hiding it away from protocol level and the outside world, such change would not trigger protocol level upgrade in the future.
You are right on this. Initially I didn't think renaming would be such an invasive change and I wanted the naming to highlight the difference between "ord" and "id". If you think the renaming creates a lot of trouble on the API side, we can definitely not do it and just create a new name for the new id like you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question that occurred to me day: what happens to nonvalidator nodes which specify the shards they track? It seems that, according to what is proposed here, those nodes will start tracking all the shards that the original shard split into. The side effect is that what the node actually tracks will no longer be the same as what is specified in config.json
and if a node stops and starts again after the split is done, it will stop tracking the shards whose ids are not in tracked_shards
. This could be confusing for people who need to run nonvalidator nodes and we should at least document such peculiar behavior. It would certainly be better if we can make this experience more smooth for nonvalidator nodes as well.
That's a good point. If we want to make the process smoother for validator nodes, when specifying shards to track, we can make them supply the epoch_id along with the shard_id as well, then at start, ShardTracker can figure out what shards these shards become in the current shards. |
Well the main problem here is that let's say someone wants to track all shards. Today they will specify
|
Yeah I understand what you meant. I think the issue again is that the meaning of shard id changes when resharding happens. One way to solve that is to be more specific than just providing the shard_id. The config file will include the epoch_id (or protocol version) along with the shard id. For example, let's say a validator starts with tracking shard 0 and they start from genesis. Then the config file will have tracked_shards: [0], tracked_shards_epoch: genesis_epoch_id. Then later when the validator restarts, the ShardTracker can know the shard |
As I am implementing the change regarding As I mentioned, the downside of moving away from @bowenwang1996 @abacabadabacaba what are your thoughts? |
@mzhangmzz using the same type does not really prevent us from mixing them up? It seems that we need some internal representation that is different anyways. So it seems that introducing a strictly different type may be helpful (this could be purely an implementation detail) |
I suggest to take a step back and ask ourselves why would anyone care about shard number X? and how would they know that they want to track exactly the shard number X? Maybe they care about a set of specific accounts and we should track based on the list of account ids instead of the shard ord/id/uid? |
I think that's a very good point. Maybe the outside world should not have information about the internal shard layout at all, since we want the sharding implementation to be as transient as possible. We can provide two options, tracking all shards, or tracking shards from accounts. |
It seems that there are the following use cases:
Today we already support tracking a set of accounts through |
This changeset implements ShardLayout and ShardConfig in near/NEPs#241
This PR implements ShardUId as specified in near/NEPs#241 so that trie states are indexed by 'shard_uidinstead ofshard_id`. The main change is to change the interface around trie storage to use shard_uid instead of shard_id. The database columns ColState and ColTrieChanges are now indexed by shard_uid instead of shard_id. Changes that this PR does not include: change account_id_to_shard_id to account for changing shard layout gc Test Plan: The main source for errors is where shard_id and shard_uid is mixed up or the wrong shard_version is used for shard_uid, which may result in DB not found error. To test for the first type, we will change a wide range of existing unit and integration tests to use a shard layout with shard_version = 1 so shard_id and shard_uid cannot be mixed up. Co-authored-by: Bowen Wang <[email protected]>
In `apply_chunks`, after processing each chunk, the state changes in `apply_results` are sorted into changes to new shards. | ||
At the end, we apply these changes to the new shards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this should also be done in a different thread, but for this specific upgrade I think it is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
### Database storage | ||
The following database columns are stored with `ShardId` as part of the database key, it will be replaced by `ShardUId` | ||
- ColState | ||
- ColChunkExtra | ||
- ColTrieChanges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
#### `TrieCachingStorage` | ||
Trie storage will be contruct database key from `ShardUId` and hash of the trie node. | ||
Trie storage will contruct database key from `ShardUId` and hash of the trie node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trie storage will contruct database key from `ShardUId` and hash of the trie node. | |
Trie storage will construct database key from `ShardUId` and hash of the trie node. |
Move SimpleNIghtshade from nightly to stable. This feature enables the transition from one shard to four shards. For more details, see https://near.org/blog/near-launches-simple-nightshade-the-first-step-towards-a-sharded-blockchain/ and near/NEPs#241
@mzhangmzz @bowenwang1996 Is it ready to be merged? |
This was implemented more than a year ago so let's merge the pull request. @mzhangmzz could you resolve conflicts and merge the PR? Thanks! |
No description provided.