-
Notifications
You must be signed in to change notification settings - Fork 20.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synchronisation failed: Dropping peer #15067
Comments
update: ERROR[08-31|22:27:00] Failed to close database database=/home/adam/.ethereum/geth/chaindata err="leveldb/table: corruption on data-block (pos=1849286): checksum mismatch, want=0x66919f5e got=0x875cb029 [file=5152408.ldb]" |
Same on ubuntu 16.04 LTS geth 1.7.0-stable-6c6c7b2a |
Have a look at this: #15001... probably related |
Same on Ubuntu 16.04, with geth-linux-amd64-1.7.0-6c6c7b2a It turned out that my problem was at the block 4370000, so update to 1.7.3 can solve the problem. |
Problem with "Synchronisation failed, dropping peer" i sync to the latest block, 20-30 min later i get this error, 5-10 min he tryies to reconnect then i have to sync again, i am 100-200 blocks behind while he was retrying to connect, this erorr repeats every 20-30 min. When i dont get this error i have the latest block, when i get this error it always leaves me 100-200 blocks behind because he wasnt connected, its annoying.
|
Same problem, ubuntu 16.04 |
Same issue Ubuntu 16.04 - is there a solution for this ? Geth version 1.7.3 for me |
Same issue here Geth 1.7.3 on Windows 10 |
same issue here Ubuntu 16.04 ,Version: 1.7.3-stable |
Debian 8, Geth 1.7.3-stable, the same issue. First time occured yesterday. Geth was running for about two weeks continuously before the problem occured |
Ubuntu 16.04, Geth 1.7.3-stable (commit 4bb3c89) Same issue, if a peer has to be dropped because of timeout, the whole blockchain sync freezes for ~1-2 minutes, messing up the whole process (as opposed to gently dropping one bad peer and downloading from others) |
Ubuntu 16.04, running inside docker on AWS with geth 1.7.3-stable having the same problem. We now moved our geth node into a different datacenter (not aws) and synching is working stable the last few days. Are you maybe also running inside AWS and have this issues? |
Ubuntu 16.04, geth 1.7.3-stable same issue. I'm running it on GCP, but it throws the same error on my Ubuntu 16.04 workstation at home. |
DebIan 8 64bit, geth 1.7.3-stable |
Have same issue in Docker ethereum/client-go:v1.7.2 |
Same issue on macos high sierra on ropsten.
|
Same thing here, Ubuntu 16.04 geth 1.8.0-unstable go1.9.2 |
Same issue Geth 1.8 Ubuntu 16.04 |
Please look at this detailed description of the issue: #15001 (comment) |
Same issue on Geth 1.8.6. I get
after which the blockchain falls out of sync by 10-100 blocks. This happens every time sync catches up. **downgrading to Geth 1.8.3 solved my problem for 3 weeks, before displaying same issues |
@wtfiwtz this is not explained by the @karalabe explanation in #15001 (comment). I would love to see an optimization that addresses the problem you succinctly laid out here #14647 (comment). There has to be a suitable built-in alternative to hosting numerous independent nodes in order to have robustness. Either a node waits too long for a response or drops a peer too quickly. I haven't dug into the code enough (nor am I a golang developer) to figure out which is the case. There should be a solution to help prevent settling into a degraded network state where most peers in a subgroup are now all behind together. This echo chamber situation should trigger a clearing of peers and reboot from boot/static nodes. Maybe this is too difficult to implement, but would go a very long way if possible. Perhaps it involves a nearest-neighbor analysis which should is possible, given that all your peers' connections are discoverable. |
Experiencing the same issue on Ubuntu 18.04 LTS (Bionic Beaver) with a brand new SSD + 8GB RAM. I've been looping around block 1.3M for the last 48 hours (nearly 80 hours total sync time so far and after many other previous attempts) using: It's been nearly a year now since this issue was posted and there doesn't appear to be a working solution. I don't believe it can be safely ignored as a byproduct of inefficient HDDs. Very many Ubuntu/Windows users have reported similar cases with consumer-grade Laptops/PCs + SSDs. Given the hardware centralization risks, security compromises, and UX nightmare this implies, shouldn't more dev. resources be allocated to find a solution? I've run Geth and Parity nodes in 2016/2017 and never faced this level of obstruction with syncing the chain. Please fix this. |
Having the same issue as OP. I have a DAPP running on live. People pay but I can't detect the payment as my geth stops syncing (with Sync Failed error). Is there any fine solution to this? Seems everyone is stuck at this hell once in life. |
Also experiencing this issue (On Windows + SSD (!)). Has there been made any progress on finding the cause of this issue? |
Related to #16825 with temp solution |
It seems, that problem can be caused by memory corruption - in my case after tuning RAM to more conservative settings and resyncing from scratch, everything finally started working fine |
@mcgravier what was your reasoning behind your conclusion it was caused by memory corruption? |
@mathieumagalhaes I have PC with Ryzen 7 processor and 3200mhz memory. However this particular processor is guaranteed to work with 2666mhz memory - everything beyond is considered an overclocking and may not be stable. Running memtest for 24h - it reported memory errors - so I've reduced memory clock from 3200mhz to 2666mhz and all issues disappeared - I can now run Geth node for extensive amount of time without getting the issue |
I have this same issue with GCP. I have tried firewall settings (allowing UDP and TCP traffic), rebooting, changing RAM/CPUs, etc. I believe memory corruption would be unlikely, since I assume GCP gives me different hardware every time I reboot and/or change the amount of memory allocated to my VM. Has anyone looked into firewall TCP session timeouts? I am wondering if there are some common firewall settings (could be on ether local firewall, or peer firewall) that cause connections to be dropped if they are inactive for a certain length of time (10 minutes on GCP), and whether it is possible that Geth would run into that in the course of normal operations. I am currently testing my theory on GCP by changing keepalive settings, but given the intermittent nature of this issue it seems like it would be difficult to be sure I am on the right track. I am wondering if anyone else has looked into this. |
Sorry, closing this because the report isn't actionable. There is no single bug in geth that causes sync failures. We are aware that sync may sometimes fail for networking reasons. |
System information
Geth version:
1.6.7
OS & Version: Linux (Ubuntu 17.04)
Actual behaviour
After several hours of working geth fails to sync any further
INFO [08-31|21:29:22] Imported new chain segment blocks=1 txs=87 mgas=6.720 elapsed=310.079ms mgasps=21.670 number=4224370 hash=a3660d…b7b5fb
INFO [08-31|21:29:41] Imported new chain segment blocks=1 txs=117 mgas=3.595 elapsed=467.968ms mgasps=7.683 number=4224371 hash=5ae6f1…edc294
INFO [08-31|21:30:10] Imported new chain segment blocks=1 txs=61 mgas=6.711 elapsed=175.206ms mgasps=38.301 number=4224372 hash=5abb29…2a0891
WARN [08-31|21:30:15] message loop peer=d8cb8306a528cf96 err=EOF
WARN [08-31|21:31:05] Synchronisation failed, dropping peer peer=11fdde20fc7831ef err="retrieved hash chain is invalid"
WARN [08-31|21:31:45] Synchronisation failed, dropping peer peer=c300581e16c7d233 err=timeout
WARN [08-31|21:32:30] Synchronisation failed, dropping peer peer=938199d61038ff42 err="retrieved hash chain is invalid"
WARN [08-31|21:35:01] Synchronisation failed, dropping peer peer=0fc5fe924314d328 err="retrieved hash chain is invalid"
After client restart, geth manages to sync latest blocks, after few hours issue repeats.
I'm running geth with flags
--rpc --shh --maxpeers 100 --lightserv 90 --cache 2048
Issue appeared today - I was running client in background for days (or even weeks) without any problems earlier
The text was updated successfully, but these errors were encountered: