Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pivot update attempts infinite #5988

Merged
merged 3 commits into from
Aug 9, 2023

Conversation

marcindsobczak
Copy link
Contributor

Resolves #5986

Changes

  • Change MaxAttemptsToUpdatePivot to infinite in practice (int.MaxValue equal 0x7fffffff, equal 2147483647, equal ~68 years)
  • remove attempts left from logs

Types of changes

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a change that causes existing functionality not to work as expected)
  • Optimization
  • Refactoring
  • Documentation update
  • Build-related changes
  • Other: change of config value

Testing

Requires testing

  • Yes
  • No

Documentation

Requires documentation update

  • Yes
  • No

Requires explanation in Release Notes

  • Yes
  • No

@LukaszRozmej
Copy link
Member

Ok but is that safe?

@kamilchodola
Copy link
Contributor

@LukaszRozmej
Only thing which would happen here is that we won't start OldHeaders until CL will be properly configured - with 15 minutes timeout it was started but there was this edge case with restarts during sync etc.

So I'd even say it is more safe than current approach because fallback like we had before was another testing path which would need to be covered all the time - now we reduce possible paths (this scenario will be possible only when someone would manually start to play with this config value to make it happen).

Thing is that in reality we don't fix real issue (unable to "catch" missing headers if that occurs - this is because of nature how we download headers from latest to oldest - missing ones are more recent than initial PIVOT set) but ensure that it won't happen in default configuration.

Proper fix would be problematic and I guess it would cause more regression than this approach.

@@ -61,7 +61,7 @@ public interface ISyncConfig : IConfig
[ConfigItem(DisabledForCli = true, HiddenFromDocs = true)]
Keccak? PivotHashParsed => PivotHash is null ? null : new Keccak(Bytes.FromHexString(PivotHash));

[ConfigItem(Description = "Max number of attempts (seconds) to update pivot block basing on Forkchoice message from Consensus Layer. Only for PoS chains.", DefaultValue = "900")]
[ConfigItem(Description = "Max number of attempts (seconds) to update pivot block basing on Forkchoice message from Consensus Layer. Only for PoS chains. Infinite by default.", DefaultValue = "2147483647")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcindsobczak Why not some value like -1? I guess this is because setting Int.Max is quicker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just have logic which is checking if attemts left > 0 in few places, making support for -1 will be nice, but requires bigger refactoring and it is always some risk. If we are not planning to use values between disabled and infinite, we could make it bool as well. But doing it extremely high (int.Max = 68 years) works the same as infinite, is quick and safest, as doesn't require any changes in the code, just config value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is what I expected to hear - we can go safe way but would be nice to leave some issue to refactor this in future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created low priority good first issue for refactoring it to be infinite instead of ~68 years: #5992

@@ -141,7 +141,7 @@ private async Task<bool> TrySetFreshPivot(CancellationToken cancellationToken)

if (finalizedBlockHash is null || finalizedBlockHash == Keccak.Zero)
{
if (_logger.IsInfo && _attemptsLeft % 10 == 0) _logger.Info($"Waiting for Forkchoice message from Consensus Layer to set fresh pivot block. {_attemptsLeft} attempts left");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempts left was pretty nice but if we keep int.max approach it does not have sense indeed. If You would plan to add some value like -1 to make it infinite instead of this approach, please rollback this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it will be infinite, then it will have even less sense, because infinite number of attempt will be always left. We can add one more variable and just increment it - it will be something like:
Waiting for Forkchoice message from Consensus Layer to set fresh pivot block [{counter}s]

And for users
Waiting for Forkchoice message from Consensus Layer to set fresh pivot block [1s]
Waiting for Forkchoice message from Consensus Layer to set fresh pivot block [2s]
...
Waiting for Forkchoice message from Consensus Layer to set fresh pivot block [123s]

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! :)

@marcindsobczak
Copy link
Contributor Author

Ok but is that safe?

When introducing PivotUpdator we made a timeout of 15min as a safety guarantee, with intention to make it infinite some day. In the meantime it proved that is working fine, the only issues are because of edge-cases caused by timeout. We are not able to sync if there is no CL - that's one thing, and downloading headers in the meantime is problematic, because in case of restart at this stage, pivot will be updated and we will end up with a gap in headers, between old pivot and new pivot. So we have two options - disable functionality of PivotUpdator if timeout would be hit, or make it infinite. In first scenario there will be no corruption (gap in headers), but after connecting with proper CL sync will start from old, hardcoded pivot, so will take way longer. Second option is to wait with downloading headers until pivot will be updated, then state sync can start almost immediately. In second option the whole sync process in most cases would be faster, so I recommend this approach.

We could mix these approaches and make timeout e.g. 6h and if it will be hit, just save metadata to never update pivot on that node. In most cases timeout will never hit - 6h is enough time to wait for setting up CL. And if there is something wrong with setup, it will just start downloading headers after 6h, just to not wait infinitely. But it will not be able to start state sync until CL connection would be fixed anyway, and after fixing it, it will need to download a lot of blocks to start state sync, so doesn't have much sense

@marcindsobczak marcindsobczak merged commit dbb8c0b into master Aug 9, 2023
@marcindsobczak marcindsobczak deleted the feature/infinite_pivot_update branch August 9, 2023 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove timeout from AutoPivot to avoid situation with missing Headers
3 participants