Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make logs more verbose during initial ATX sync #5182

Closed
lrettig opened this issue Oct 19, 2023 · 3 comments
Closed

Make logs more verbose during initial ATX sync #5182

lrettig opened this issue Oct 19, 2023 · 3 comments

Comments

@lrettig
Copy link
Member

lrettig commented Oct 19, 2023

Description

When syncing a new node from scratch, the first ~30 minutes (on a relatively powerful system) appear to be spent downloading and verifying ATXs and their proofs. This happens before blocks are downloaded. There are two possible issues here:

  • this processing work consumes 100% of every available system core, in bursts. Do we want to limit this to, e.g., 3/4 of the available cores as we do for later proof verification work? (This is an anecdotal observation, I could be wrong about the fact that it's currently not limited.)
  • there's zero information in the log, or node status, or event log about the work that's being done. During this time the node appears to be doing nothing at all and making no progress on sync, aside from the sporadic CPU bursts. Can we print some progress here, both in the log and API, along the lines of "X of Y ATXs downloaded and processed", or "processed epoch X of Y"?

Affected code

Unknown

This issue appears in commit hash: v1.2.2

See these two (private) Slack threads: one two

@dshulyak
Copy link
Contributor

we do limit number of workers to 3 / 4 by default https://github.com/spacemeshos/go-spacemesh/blob/develop/activation/post.go#L89 .

atleast on my computer it works exactly as expected and uses 18 cores out of 24, so the load in top during this initial period is 1800%.

@lrettig
Copy link
Member Author

lrettig commented Oct 25, 2023

This is what it looks like:

image

I think it is limited to 3/4 of cores, which seems okay.

The main thing we need to do is add info to the log during this phase.

@lrettig lrettig changed the title Proof verification on initial sync pegs CPU Make logs more verbose during initial ATX sync Oct 25, 2023
@dshulyak
Copy link
Contributor

in the v1.4 there is information like below

2024-03-12T14:46:03.491Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 12, "downloaded": 384824, "total": 479413, "progress": 49629, "rate per sec": 96.8168506743115}
2024-03-12T14:53:16.557Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 12, "downloaded": 433142, "total": 479413, "progress": 48318, "rate per sec": 111.57186359119548}
2024-03-12T15:01:56.830Z        INFO    node.sync       atx sync completed      {"sessionId": "f8340c97", "epoch_id": 12, "downloaded": 479413, "total": 479413, "unavailable": 0, "duration": "2h54m23.936593989s"}
2024-03-12T15:02:01.870Z        INFO    node.sync       downloaded epoch info   {"sessionId": "f8340c97", "epoch_id": 13, "peer": "12D3KooWM6fR5PSHqcNT9cWf7DN8fxCthVZ2xDZn5ruuZLm6sYwY", "atxs": 767659}
2024-03-12T15:02:03.797Z        INFO    node.sync       downloaded epoch info   {"sessionId": "f8340c97", "epoch_id": 13, "peer": "12D3KooWKCCgdkWpkX8vmJMktKQgZo62tCvZKXDgPy7crTWPvwLh", "atxs": 767659}
2024-03-12T15:22:17.246Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 13, "downloaded": 2181, "total": 767659, "progress": 2181, "rate per sec": 1.787096237173516}
2024-03-12T15:42:17.589Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 13, "downloaded": 16815, "total": 767659, "progress": 14634, "rate per sec": 12.191508942598723}
2024-03-12T16:02:35.469Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 13, "downloaded": 34068, "total": 767659, "progress": 17253, "rate per sec": 14.166421080231062}
2024-03-12T16:22:46.508Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 13, "downloaded": 52232, "total": 767659, "progress": 18164, "rate per sec": 14.998696363917372}
2024-03-12T16:43:06.594Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 13, "downloaded": 78927, "total": 767659, "progress": 26695, "rate per sec": 21.87960233535297}
2024-03-12T17:03:08.376Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 13, "downloaded": 108447, "total": 767659, "progress": 29520, "rate per sec": 24.563528051871266}
2024-03-12T17:23:12.726Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 13, "downloaded": 156285, "total": 767659, "progress": 47838, "rate per sec": 39.72101291810472}
2024-03-12T17:43:12.884Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 13, "downloaded": 204909, "total": 767659, "progress": 48624, "rate per sec": 40.51468153461133}
2024-03-12T18:03:21.317Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 13, "downloaded": 259306, "total": 767659, "progress": 54397, "rate per sec": 45.01447156464441}
2024-03-12T18:23:29.482Z        INFO    node.sync       atx sync progress       {"sessionId": "f8340c97", "epoch_id": 13, "downloaded": 320176, "total": 767659, "progress": 60870, "rate per sec": 50.3822240623955}

i think it is the right amount of verbose, there is also api that can be used to change log level temporarily if this is desirable

grpcurl -plaintext -d '{"module": "*", "level": "debug"}' 127.0.0.1:9093 spacemesh.v1.DebugService.ChangeLogLevel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants