-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mitigating problems with the network due to poor atxs propagation and bloated active sets #5366
Comments
4 tasks
spacemesh-bors bot
pushed a commit
that referenced
this issue
Dec 21, 2023
related: #5366 this change disables sync reconciliation at the end and start of epoch, normal atx downloading works the way it was working. instead it will ask peers for atxs that are recorded in the set, shared by the server.
spacemesh-bors bot
pushed a commit
that referenced
this issue
Dec 21, 2023
related: #5366 this change disables sync reconciliation at the end and start of epoch, normal atx downloading works the way it was working. instead it will ask peers for atxs that are recorded in the set, shared by the server.
spacemesh-bors bot
pushed a commit
that referenced
this issue
Dec 21, 2023
related: #5366 this change disables sync reconciliation at the end and start of epoch, normal atx downloading works the way it was working. instead it will ask peers for atxs that are recorded in the set, shared by the server.
dshulyak
added a commit
to dshulyak/go-spacemesh
that referenced
this issue
Dec 21, 2023
related: spacemeshos#5366 this change disables sync reconciliation at the end and start of epoch, normal atx downloading works the way it was working. instead it will ask peers for atxs that are recorded in the set, shared by the server.
spacemesh-bors bot
pushed a commit
that referenced
this issue
Dec 21, 2023
…5378) ## Motivation related to #5366 Allow to set a fallback active set to be used by the proposal builder via the same mechanism as we use for the fallback beacon. ## Changes - some minor fixes in `bootstrap` cmd code - extend `ProposalBuilder` to be able to set a specific active set for an epoch - extend node routine listening for updates to fallback beacon / activeset to propagate a fallback to the `ProposalBuilder` ## Test Plan - added test case to `ProposalBuilder` to verify use of fallback active set if available - manual testing on `testnet-10` ## TODO <!-- This section should be removed when all items are complete --> - [x] Explain motivation or link existing issue(s) - [x] Test changes and document test plan - [x] Update documentation as needed - [x] Update [changelog](../CHANGELOG.md) as needed
dshulyak
added a commit
that referenced
this issue
Dec 21, 2023
related: #5366 this change disables sync reconciliation at the end and start of epoch, normal atx downloading works the way it was working. instead it will ask peers for atxs that are recorded in the set, shared by the server.
spacemesh-bors bot
pushed a commit
that referenced
this issue
Dec 21, 2023
…5378) ## Motivation related to #5366 Allow to set a fallback active set to be used by the proposal builder via the same mechanism as we use for the fallback beacon. ## Changes - some minor fixes in `bootstrap` cmd code - extend `ProposalBuilder` to be able to set a specific active set for an epoch - extend node routine listening for updates to fallback beacon / activeset to propagate a fallback to the `ProposalBuilder` ## Test Plan - added test case to `ProposalBuilder` to verify use of fallback active set if available - manual testing on `testnet-10` ## TODO <!-- This section should be removed when all items are complete --> - [x] Explain motivation or link existing issue(s) - [x] Test changes and document test plan - [x] Update documentation as needed - [x] Update [changelog](../CHANGELOG.md) as needed
dsmello
pushed a commit
that referenced
this issue
Dec 28, 2023
related: #5366 this change disables sync reconciliation at the end and start of epoch, normal atx downloading works the way it was working. instead it will ask peers for atxs that are recorded in the set, shared by the server.
dsmello
pushed a commit
that referenced
this issue
Dec 28, 2023
…5378) ## Motivation related to #5366 Allow to set a fallback active set to be used by the proposal builder via the same mechanism as we use for the fallback beacon. ## Changes - some minor fixes in `bootstrap` cmd code - extend `ProposalBuilder` to be able to set a specific active set for an epoch - extend node routine listening for updates to fallback beacon / activeset to propagate a fallback to the `ProposalBuilder` ## Test Plan - added test case to `ProposalBuilder` to verify use of fallback active set if available - manual testing on `testnet-10` ## TODO <!-- This section should be removed when all items are complete --> - [x] Explain motivation or link existing issue(s) - [x] Test changes and document test plan - [x] Update documentation as needed - [x] Update [changelog](../CHANGELOG.md) as needed
This was referenced Jan 15, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
there are two problems contributing to the extreme bandwidth usage. both of them is the result of the abnormal growth in the number of atxs.
the first problem exists due to the process which is called atx sync. this process queries all known atxs in the current/next epoch and then fetches those that are missing from 20 peers (configurable). this process is neither scalable or effective, as it doesn't guarantee that node will download all missing atxs. it still can miss them due to concurrency and data availability.
the second problem is the result of poor atxs propagation and because atx sync doesn't do its job. as a a result many nodes are using slightly different active set. this active set is referenced in the first ballot that node produces. when node receives a proposal, that includes such first ballot, it will ask for full active set if it sees that referenced active set is not stored locally. this is also the reason why people are missing rewards.
short term mitigation
we want to achieve two things:
we can achieve both of those goals by sharing active set from a our server. it was introduced in codebase for mitigating certain failures in hare oracle, and can be reused to compute tortoise active set.
this changes will eliminate large requests during consensus phase, and by my estimate will solve most existing issues with proposal getting lost.
long term mitigation
long term we need to get rid of centralized infrastructure. to achieve that atxs propagation has to be reliable, such that any honest node should be able to download atx within 30m (this is configurable).
sync changes for that include better atx sync protocol #5306 and replacing collections request with streams, so that limits are not hard-coded in codec specification #5278.
activeset should be encoded with a diff, rather than re-encoding whole activeset #5282. and additionally we may consider building active deterministically from 1 or several blocks, #5288 (this is a change that can be finished to work properly).
beside sync and activeset problems there are still problems with atx propagation. improvements in that area include:
in my opinion sync changes outlined above, deterministically generating activeset and atxs 1 and 2 changes are the most important.
The text was updated successfully, but these errors were encountered: