Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"State heal in progress" after sync forever #1198

Closed
DaveWK opened this issue Nov 22, 2022 · 10 comments
Closed

"State heal in progress" after sync forever #1198

DaveWK opened this issue Nov 22, 2022 · 10 comments
Assignees
Labels
question Further information is requested X-nodesync task filter for node sync issue: full, snap, light...

Comments

@DaveWK
Copy link

DaveWK commented Nov 22, 2022

System information

Geth version: 1.17
OS & Version: Linux, Fedora 37

Expected behaviour

Finishes syncing and able to use RPC

Actual behaviour

Never seems to finish syncing, keeps saying

 t=2022-11-22T15:57:45+0000 lvl=info msg="State heal in progress"                 accounts=10,522,[email protected] slots=22,809,[email protected]      [email protected]      nodes=135,122,[email protected] pending=61470

in logs

Steps to reproduce the behaviour

Node is an AWS c6a.8xlarge Plenty of disk IO and space.

@forcodedancing forcodedancing self-assigned this Nov 23, 2022
@forcodedancing forcodedancing added the question Further information is requested label Nov 23, 2022
@forcodedancing
Copy link
Contributor

@DaveWK tDo you sync for the first time? Do you use snapshot https://github.com/bnb-chain/bsc-snapshots ?

@DaveWK
Copy link
Author

DaveWK commented Nov 23, 2022

Synced from genesis using --syncmode=snap; has been syncing for 4 days. in the logs I see:

t=2022-11-23T03:38:10+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="500.329µs" number=23,293,067 hash=0x621e9128bad96d6db5bc8e5682abb77f3d5156193a1ffdb327bdfe65ddc65c23
t=2022-11-23T03:38:13+0000 lvl=info msg="State heal in progress"                 accounts=736,[email protected] slots=1,730,[email protected] [email protected] nodes=151,785,[email protected] pending=51447
t=2022-11-23T03:38:13+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="617.301µs" number=23,293,068 hash=0x3f71527b518f67ab546b5299a6b5ce3d88c7889fc822cce01e123b7f914473dc
t=2022-11-23T03:38:17+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="493.629µs" number=23,293,069 hash=0x17e9497ba49b0bef872fda370f1c086a4c14a4ed4bc91c4e7bef2ed27bc29d15
t=2022-11-23T03:38:20+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="502.819µs" number=23,293,070 hash=0x083121f3f33df93604b401d3c8309737c99d0336492f0f9a38dc7c1e1303fe22
t=2022-11-23T03:38:21+0000 lvl=info msg="State heal in progress"                 accounts=736,[email protected] slots=1,731,[email protected] [email protected] nodes=151,790,[email protected] pending=49316
t=2022-11-23T03:38:23+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="621.872µs" number=23,293,071 hash=0x373afe1bc7af546b20c223e6c2e08c172dc7d3558628e8c518d683494e7a14f1
t=2022-11-23T03:38:26+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="650.561µs" number=23,293,072 hash=0x9fd4b29e214c6eb26d2cc1d787047b6660829218f504fa46eb572b58d049d1e0
t=2022-11-23T03:38:27+0000 lvl=info msg="Downloader queue stats"                 receiptTasks=0 blockTasks=0 itemSize="148.44 KiB" throttle=1766
t=2022-11-23T03:38:29+0000 lvl=info msg="State heal in progress"                 accounts=737,[email protected] slots=1,733,[email protected] [email protected] nodes=151,795,[email protected] pending=47242
t=2022-11-23T03:38:29+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="738.533µs" number=23,293,073 hash=0x184d87f3c62f9a19167ccb6105b61f658e8c6cf287d5a8fbbdae39a2d15d4324
t=2022-11-23T03:38:30+0000 lvl=warn msg="Pivot became stale, moving"             old=23,292,947 new=23,293,011
t=2022-11-23T03:38:30+0000 lvl=info msg="Imported new block receipts"            count=64  elapsed=87.428ms    number=23,293,010 hash=0x7d43a4f453525c0ea0a32d25aa923538443e0c228991a8594ab095b3c317e677 age=3m17s    size="5.22 MiB"
t=2022-11-23T03:38:30+0000 lvl=info msg="State heal in progress"                 accounts=737,[email protected] slots=1,733,[email protected] [email protected] nodes=151,795,[email protected] pending=47156
t=2022-11-23T03:38:32+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="662.783µs" number=23,293,074 hash=0x4b9d967ce1a722bce0fc9e3972fa334b2b5148a958b2bf569d1a51ad8ea590b5
t=2022-11-23T03:38:36+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="528.669µs" number=23,293,075 hash=0xb45ce20377912c88a2efe944129667d4dfdda0741f32f9edfddbfd82e43db5e2
t=2022-11-23T03:38:39+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="525.979µs" number=23,293,076 hash=0xe3d9fb0330ecd59a9a1dd0c3112636f68c73f4e0e0c9b776acdd09db6bf8e232
t=2022-11-23T03:38:39+0000 lvl=info msg="State heal in progress"                 accounts=737,[email protected] slots=1,733,[email protected] [email protected] nodes=151,800,[email protected] pending=19681
t=2022-11-23T03:38:42+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="789.245µs" number=23,293,077 hash=0xd7509d939399f650e0a4ecf360bc9fac143b8fcbbfa97859d3aca279079e9446
t=2022-11-23T03:38:45+0000 lvl=info msg="Imported new block headers"             count=1   elapsed="417.178µs" number=23,293,078 hash=0x07cf42df33a19d63032dc11392167a8c17d0824147c4cc500eec4cb21b9e2535
t=2022-11-23T03:38:47+0000 lvl=info msg="State heal in progress"                 accounts=737,[email protected] slots=1,733,[email protected] [email protected] nodes=151,802,[email protected] pending=23696

using 1.7 TB disk space. I have moved the --syncmode param to full after the initial sync and restarted, but after the initialization logs pass it returns to the endless state heal loop.

@0xChupaCabra
Copy link

Same here on a 48 cores server, load avg less than 10%.

free -g
               total        used        free      shared  buff/cache   available
Mem:             754         316           6           0         431         432

In other discussions it has been said it is because server has not enough resources to catchup.
I need to sync from scratch as the snapshot has ancient data pruned.

@mosinb
Copy link

mosinb commented Nov 24, 2022

state heal

Hi, the state heal seems like a loop but usually it is not. This process can take time to finish, sometimes days.

Same here on a 48 cores server, load avg less than 10%.

free -g
               total        used        free      shared  buff/cache   available
Mem:             754         316           6           0         431         432

In other discussions it has been said it is because server has not enough resources to catchup. I need to sync from scratch as the snapshot has ancient data pruned.

Hi, you have enough memory for sure, try increasing the --cache to a higher value. What is your IOPS of your storage? and what is the reason you need all the ancient data? Do you need all the historical data of the Blockchain? If you do so you may need to run an Archive node.

Also for this type of questions I would recommend to reach out on Discord for faster response: https://discord.gg/bnbchain

@0xChupaCabra
Copy link

state heal

Hi, the state heal seems like a loop but usually it is not. This process can take time to finish, sometimes days.

Same here on a 48 cores server, load avg less than 10%.

free -g
               total        used        free      shared  buff/cache   available
Mem:             754         316           6           0         431         432

In other discussions it has been said it is because server has not enough resources to catchup. I need to sync from scratch as the snapshot has ancient data pruned.

Hi, you have enough memory for sure, try increasing the --cache to a higher value. What is your IOPS of your storage? and what is the reason you need all the ancient data? Do you need all the historical data of the Blockchain? If you do so you may need to run an Archive node.

Also for this type of questions I would recommend to reach out on Discord for faster response: https://discord.gg/bnbchain

Thanks for the suggestion. I just deleted the whole chain and restarted the node with --syncmode full option.
The server has 7TB NVMe drives. On another one 14TB SSDs.

@DaveWK
Copy link
Author

DaveWK commented Nov 25, 2022

This has happened to me before in ehtereum geth recently (v1.10.25), and the suggestion was to use the rolling v1.11 version:
ethereum/go-ethereum#25865

After failing and stuck in state heal on v1.10.25, can confirm 1.11.0-unstable-1daea030 worked. You may be able to find the bug fix or improvement by bisecting the go-ethereum code

@DaveWK
Copy link
Author

DaveWK commented Nov 25, 2022

I did a little bit of looking, and think I have narrowed it down to these 3 commits:
ethereum/go-ethereum#25651
ethereum/go-ethereum#25666
ethereum/go-ethereum#25694

@forcodedancing
Copy link
Contributor

@DaveWK Thanks, we will look into it.

@forcodedancing
Copy link
Contributor

@DaveWK PR has been created, #1226 Thanks for your reporting and analysis.

@jacobpake
Copy link

@DaveWK, have you tried with the PR? I'm running it now, on 3rd day of State Heal

@weiihann weiihann added the X-nodesync task filter for node sync issue: full, snap, light... label Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested X-nodesync task filter for node sync issue: full, snap, light...
Projects
None yet
Development

No branches or pull requests

6 participants