Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds CLI flags to delay publishing for edge case testing on PeerDAS devnets #6947

Open
wants to merge 3 commits into
base: unstable
Choose a base branch
from

Conversation

kamuik16
Copy link
Contributor

@kamuik16 kamuik16 commented Feb 7, 2025

Issue Addressed

Closes #6919

Additional Info

  • Added two optional config fields (block_publishing_delay and data_column_publishing_delay) to ChainConfig, with defaults set to None.

  • CLI Flags:
    Introduced hidden CLI arguments (--delay-block-publishing and --delay-data-column-publishing) to set delays (in seconds) for testing purposes.

  • For block publishing: Modified the publish_block_p2p closure to accept a delay parameter. If set, the closure calls std::thread::sleep(delay) before publishing.

  • For data columns: Before calling publish_column_sidecars, the code awaits a tokio::time::sleep(delay).

Known Limitations

  • block_publishing_delay only works in the default broadcast validation mode Gossip. This is the mode use in most testing. The delay was not added to the other mode to avoid additional complexity.
  • data_column_publishing_delay: if block_publishing_delay is also used, then it will sleep for a minimum of block_publishing_delay seconds. This limitation is probably ok as it's not really important enough to worth potentially impacting mainnet code path.

Reason for Different Sleep Methods:

  • std::thread::sleep: Used in the synchronous closure publish_block_p2p because async closures are unstable and .await cannot be used there.
  • tokio::time::sleep: Used in the async part of the function to avoid blocking the executor.

I also might be completely wrong here, do correct me, or feel free to discard the PR if this is not the solution 😄.

@chong-he chong-he added the ready-for-review The code is ready for review label Feb 10, 2025
@jimmygchen jimmygchen added the das Data Availability Sampling label Feb 10, 2025
// Add delay before publishing the block to the network.
if let Some(block_publishing_delay) = block_publishing_delay {
std::thread::sleep(block_publishing_delay);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so after messing with this for a while:

  • std::thread::sleep: Used in the synchronous closure publish_block_p2p because async closures are unstable and .await cannot be used there.

Yeah I couldn't see an easy way of converting both to an async context. It's a shame because we are in an async context already :/

Copy link
Member

@jimmygchen jimmygchen Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is for testing only, it may significantly impact testing performance, and it may be even worse given we usually run testing under resource constraint environments (e.g. Kurtosis, 4+ nodes on a machine).

I'm thinking maybe we just do the block publishing delay for the BroadcastValidation::Gossip case, since we mainly use this in devnet testing?

    if BroadcastValidation::Gossip == validation_level && should_publish_block {
        // add delay here
        publish_block_p2p(
            block.clone(),
            sender_clone.clone(),
            log.clone(),
            seen_timestamp,
        )
        .map_err(|_| warp_utils::reject::custom_server_error("unable to publish".into()))?;
    }

and if we really want to cover the other two broadcast variants, async closure is stabilising in two weeks... 🤩

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ As @ethDreamer pointed out, issue with this is it also delays data columns publishing, and i don't see any easy way around it without making this function even more complex - although I think the specific scenario where we want to test block delay without column delay is pretty low value, we can potentially just leave this as a known issue.

Copy link
Member

@jimmygchen jimmygchen Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethDreamer Please see comment above - I ended up just delaying the BroadcastValidation::Gossip code path, using tokio::time::sleep
e3957d3

Copy link
Member

@ethDreamer ethDreamer Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay so the the logic now reduces to:

if validation_level == Gossip {
    sleep(block_publishing_delay)
    publish_block()
    sleep(max(data_column_publishing_delay-block_publishing_delay, 0))
    publish_data_columns()
} else {
    sleep(data_column_publishing_delay)
    publish_data_columns()
    publish_block()
}

which seems okay for testing but maybe we want to leave a TODO or something to either refactor this or rip this out once we're done testing?

beacon_node/http_api/src/publish_blocks.rs Show resolved Hide resolved
beacon_node/src/cli.rs Show resolved Hide resolved
Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @kamuik16!

Please see comments above, i'm ok with the trade-off of not covering the following to keep things simple (it would be worth documenting in the CLI description though!)

  1. delay publishing blocks without delaying columns - this scenario is less important, since we expect other nodes in the network to publish columns anyway due to distributed blob publishing.
  2. covering BroadcastValidation::Consensus and ConsensusAndEquivocation::ConsensusAndEquivocation: if we just implement it for Gossip (which is the main one used in testing), then we shouldn't need the thread::sleep.

What do you think?

@jimmygchen jimmygchen added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Feb 13, 2025
@jimmygchen
Copy link
Member

@kamuik16

Any thoughts on the comments?

It would be handy to have this available to test the current active devnet -
would you have time to work on this in the next week? it's no prob at all if you're busy with other things - let us know and we can continue where you left off ☺️

Thanks!

@kamuik16
Copy link
Contributor Author

@kamuik16

Any thoughts on the comments?

It would be handy to have this available to test the current active devnet - would you have time to work on this in the next week? it's no prob at all if you're busy with other things - let us know and we can continue where you left off ☺️

Thanks!

Hey @jimmygchen and @ethDreamer, thanks for the comments, though I understood, I do not have enough context to do the code changes suggested and it will take some time too. But also, I'm busy w/ other things too, it would be really great if you guys can continue where I left off.

@jimmygchen
Copy link
Member

No problem, thank you @kamuik16 for your work 🙏 We'll take care of the rest.

@jimmygchen jimmygchen added work-in-progress PR is a work-in-progress and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Feb 18, 2025
@kamuik16
Copy link
Contributor Author

No problem, thank you @kamuik16 for your work 🙏 We'll take care of the rest.

Thanks and sorry for leaving in between :(

@jimmygchen jimmygchen self-assigned this Feb 19, 2025
@jimmygchen jimmygchen added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Feb 19, 2025
@jimmygchen
Copy link
Member

Hey @ethDreamer,
I've addressed our review comments, would you mind re-reviewing this please?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
das Data Availability Sampling ready-for-review The code is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants