-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: reduce max packet receive time during leader window #2801
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bw-solana Could you review this simple change; I have tested very similar changes previouly.
As @cavemanloverboy mentioned, something that can happen is we schedule or drop all txs in our buffer and enter a 100ms receive while leader - this can be extremely slow.
Before we get a better type for deserialization I think this is a reasonable stop-gap solution.
Code change itself is straight-forward, but since I've made previous changes in past I think its' a bit sketch for me to approve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
"Okay, it's bad we'll wait up to 100ms, but surely we'll hit the packet limit first in the practical case"
checks packet limit
🤡
i never added the damn ci flag. will watch this and merge |
What are the implications of this change? |
from OP: "In certain cases (if the transaction container is emptied during a leader's window), the scheduler controller may wait up to 100 milliseconds for incoming packets." There is a (somewhat rare) case where the scheduler will collect packets for 100 ms before it even begins scheduling. That's 1/4 of the a slot... This will reduce this time to 10ms so that there is never a time where the scheduler is twiddling its thumbs waiting for packets. |
Backports to the stable branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. |
(cherry picked from commit 20e0df4)
(cherry picked from commit 20e0df4)
(cherry picked from commit 20e0df4)
…ort of #2801) (#4544) fix: reduce max packet receive time during leader window (#2801) (cherry picked from commit 20e0df4) Co-authored-by: cavemanloverboy <[email protected]>
* v2.0: Reclaims more old accounts in `clean` (backport of anza-xyz#4044) (anza-xyz#4089) * Reclaims more old accounts in `clean` (anza-xyz#4044) (cherry picked from commit 3d43824) # Conflicts: # accounts-db/src/accounts_db.rs # accounts-db/src/accounts_db/tests.rs * fix merge conflicts --------- Co-authored-by: Brooks <[email protected]> * v2.0: Fixes clean_old_storages_with_reclaims tests (backport of anza-xyz#4147) (anza-xyz#4166) * Fixes clean_old_storages_with_reclaims tests (anza-xyz#4147) (cherry picked from commit 4eabeed) # Conflicts: # accounts-db/src/accounts_db/tests.rs * fix merge conflicts --------- Co-authored-by: Brooks <[email protected]> * v2.0: blockstore: mark slot as dead on data shred merkle root conflict (backport of anza-xyz#3970) (anza-xyz#4074) * blockstore: mark slot as dead on data shred merkle root conflict (anza-xyz#3970) (cherry picked from commit 5564a94) # Conflicts: # ledger/src/blockstore.rs * fix conflicts --------- Co-authored-by: Ashwin Sekar <[email protected]> Co-authored-by: Ashwin Sekar <[email protected]> * Bump version to v2.0.22 (anza-xyz#4200) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: hardcode rust version for publish-crate (anza-xyz#4228) * Bump version to v2.0.23 (anza-xyz#4419) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: rolls out chained Merkle shreds to ~21% of mainnet slots (backport of anza-xyz#4431) (anza-xyz#4434) rolls out chained Merkle shreds to ~21% of mainnet slots (anza-xyz#4431) (cherry picked from commit 9d09787) Co-authored-by: behzad nouri <[email protected]> * v2.0: [rpc] Fatal `getSignaturesForAddress()` when Bigtable errors (backport of anza-xyz#3700) (anza-xyz#4442) [rpc] Fatal `getSignaturesForAddress()` when Bigtable errors (anza-xyz#3700) * Unindent code in `get_signatures_for_address` * Add a custom JSON-RPC error to throw when long-term storage (ie. Bigtable) can't be reached * When the `before`/`until` signatures can't be found, throw `SignatureNotFound` instead of `RowNotFound` * Fatal `getSignaturesForAddress` calls when Bigtable must be queried but can't be reached (cherry picked from commit 52f132c) Co-authored-by: Steven Luscher <[email protected]> * v2.0: ci: bump [upload|download]-artifact to v4 (anza-xyz#4501) ci: bump [upload|download]-artifact to v4 * v2.0: ci: hardcode crate publishing version (anza-xyz#4515) ci: hardcode rust version for publish-crate * Bump version to v2.0.24 (anza-xyz#4528) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: fix: reduce max packet receive time during leader window (backport of anza-xyz#2801) (anza-xyz#4544) fix: reduce max packet receive time during leader window (anza-xyz#2801) (cherry picked from commit 20e0df4) Co-authored-by: cavemanloverboy <[email protected]> * v2.0: Scheduler Frequency Fixes (backport of anza-xyz#4545) (anza-xyz#4576) * Change prio_graph_scheduler configurations for 1k maxs, 256 look ahead * Break loop on scanned transaction count * make Hold decision behave same as Consume during receive * receive maximum of 5_000 packets - loose max * receive_completed before process_transactions --------- Co-authored-by: Andrew Fitzgerald <[email protected]> --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Brooks <[email protected]> Co-authored-by: Ashwin Sekar <[email protected]> Co-authored-by: Ashwin Sekar <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Yihau Chen <[email protected]> Co-authored-by: behzad nouri <[email protected]> Co-authored-by: Steven Luscher <[email protected]> Co-authored-by: cavemanloverboy <[email protected]> Co-authored-by: Andrew Fitzgerald <[email protected]>
* v2.0: Reclaims more old accounts in `clean` (backport of anza-xyz#4044) (anza-xyz#4089) * Reclaims more old accounts in `clean` (anza-xyz#4044) (cherry picked from commit 3d43824) # Conflicts: # accounts-db/src/accounts_db.rs # accounts-db/src/accounts_db/tests.rs * fix merge conflicts --------- Co-authored-by: Brooks <[email protected]> * v2.0: Fixes clean_old_storages_with_reclaims tests (backport of anza-xyz#4147) (anza-xyz#4166) * Fixes clean_old_storages_with_reclaims tests (anza-xyz#4147) (cherry picked from commit 4eabeed) # Conflicts: # accounts-db/src/accounts_db/tests.rs * fix merge conflicts --------- Co-authored-by: Brooks <[email protected]> * v2.0: blockstore: mark slot as dead on data shred merkle root conflict (backport of anza-xyz#3970) (anza-xyz#4074) * blockstore: mark slot as dead on data shred merkle root conflict (anza-xyz#3970) (cherry picked from commit 5564a94) # Conflicts: # ledger/src/blockstore.rs * fix conflicts --------- Co-authored-by: Ashwin Sekar <[email protected]> Co-authored-by: Ashwin Sekar <[email protected]> * Bump version to v2.0.22 (anza-xyz#4200) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: hardcode rust version for publish-crate (anza-xyz#4228) * Bump version to v2.0.23 (anza-xyz#4419) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: rolls out chained Merkle shreds to ~21% of mainnet slots (backport of anza-xyz#4431) (anza-xyz#4434) rolls out chained Merkle shreds to ~21% of mainnet slots (anza-xyz#4431) (cherry picked from commit 9d09787) Co-authored-by: behzad nouri <[email protected]> * v2.0: [rpc] Fatal `getSignaturesForAddress()` when Bigtable errors (backport of anza-xyz#3700) (anza-xyz#4442) [rpc] Fatal `getSignaturesForAddress()` when Bigtable errors (anza-xyz#3700) * Unindent code in `get_signatures_for_address` * Add a custom JSON-RPC error to throw when long-term storage (ie. Bigtable) can't be reached * When the `before`/`until` signatures can't be found, throw `SignatureNotFound` instead of `RowNotFound` * Fatal `getSignaturesForAddress` calls when Bigtable must be queried but can't be reached (cherry picked from commit 52f132c) Co-authored-by: Steven Luscher <[email protected]> * v2.0: ci: bump [upload|download]-artifact to v4 (anza-xyz#4501) ci: bump [upload|download]-artifact to v4 * v2.0: ci: hardcode crate publishing version (anza-xyz#4515) ci: hardcode rust version for publish-crate * Bump version to v2.0.24 (anza-xyz#4528) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * v2.0: fix: reduce max packet receive time during leader window (backport of anza-xyz#2801) (anza-xyz#4544) fix: reduce max packet receive time during leader window (anza-xyz#2801) (cherry picked from commit 20e0df4) Co-authored-by: cavemanloverboy <[email protected]> * v2.0: Scheduler Frequency Fixes (backport of anza-xyz#4545) (anza-xyz#4576) * Change prio_graph_scheduler configurations for 1k maxs, 256 look ahead * Break loop on scanned transaction count * make Hold decision behave same as Consume during receive * receive maximum of 5_000 packets - loose max * receive_completed before process_transactions --------- Co-authored-by: Andrew Fitzgerald <[email protected]> --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Brooks <[email protected]> Co-authored-by: Ashwin Sekar <[email protected]> Co-authored-by: Ashwin Sekar <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Yihau Chen <[email protected]> Co-authored-by: behzad nouri <[email protected]> Co-authored-by: Steven Luscher <[email protected]> Co-authored-by: cavemanloverboy <[email protected]> Co-authored-by: Andrew Fitzgerald <[email protected]>
Problem
In certain cases (if the transaction container is emptied during a leader's window), the scheduler controller may wait up to 100 milliseconds for incoming packets.
Summary of Changes
Reduce the constant max wait from 100 ms to 10 ms.