You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a variety of instances Besu will encounter a rocksdb "busy" exception. This may be exacerbated by the use of OptimisticTransactionDB. The 'busy' errors are especially pronounced on mainnet nodes during fast-sync.
e.g.
{
"timestamp":"2022-04-07T15:42:19,838",
"level":"ERROR",
"thread":"EthScheduler-Services-7 (batchPersistData)",
"class":"FastWorldStateDownloadProcess",
"message":"Pipeline failed",
"throwable":"
org.hyperledger.besu.plugin.services.exception.StorageException: org.rocksdb.RocksDBException: Busy
at org.hyperledger.besu.plugin.services.storage.rocksdb.segmented.RocksDBColumnarKeyValueStorage$RocksDbTransaction.commit(RocksDBColumnarKeyValueStorage.java:287)
at org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageTransactionTransitionValidatorDecorator.commit(SegmentedKeyValueStorageTransactionTransitionValidatorDecorator.java:49)
at org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageAdapter$1.commit(SegmentedKeyValueStorageAdapter.java:90)
at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateKeyValueStorage$Updater.commit(BonsaiWorldStateKeyValueStorage.java:333)
at org.hyperledger.besu.ethereum.eth.sync.fastsync.worldstate.PersistDataStep.persist(PersistDataStep.java:53)
at org.hyperledger.besu.ethereum.eth.sync.fastsync.worldstate.FastWorldStateDownloadProcess$Builder.lambda$build$3(FastWorldStateDownloadProcess.java:202)
at org.hyperledger.besu.services.pipeline.MapProcessor.processNextInput(MapProcessor.java:31)
at org.hyperledger.besu.services.pipeline.ProcessingStage.run(ProcessingStage.java:38)
at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:152)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.rocksdb.RocksDBException: Busy
at org.rocksdb.Transaction.commit(Native Method)
at org.rocksdb.Transaction.commit(Transaction.java:206)
at org.hyperledger.besu.plugin.services.storage.rocksdb.segmented.RocksDBColumnarKeyValueStorage$RocksDbTransaction.commit(RocksDBColumnarKeyValueStorage.java:281)
... 13 more"
}
We should either handle these exceptions and implement a retry strategy, or revert to TransactionDB
Acceptance Criteria
Besu should handle storage exceptions gracefully and retry where possible
Steps to Reproduce (Bug)
the easiest way to reproduce is to simply fast-sync mainnet on bonsai
Expected behavior: [What you expect to happen]
fast-sync completes
Actual behavior: [What actually happens]
Unhandled "Busy" exeptions cause the world state downloader to abend, but the besu process stays up and continues to download blocks. When sync completes the worldstate is incomplete.
Frequency: [What percentage of the time does it occur?]
~100% on ec2 instances which have 6000 or fewer IOPS
Versions (Add all that apply)
Software version: 22.x
The text was updated successfully, but these errors were encountered:
Description
In a variety of instances Besu will encounter a rocksdb "busy" exception. This may be exacerbated by the use of OptimisticTransactionDB. The 'busy' errors are especially pronounced on mainnet nodes during fast-sync.
e.g.
We should either handle these exceptions and implement a retry strategy, or revert to TransactionDB
Acceptance Criteria
Steps to Reproduce (Bug)
Expected behavior: [What you expect to happen]
fast-sync completes
Actual behavior: [What actually happens]
Unhandled "Busy" exeptions cause the world state downloader to abend, but the besu process stays up and continues to download blocks. When sync completes the worldstate is incomplete.
Frequency: [What percentage of the time does it occur?]
~100% on ec2 instances which have 6000 or fewer IOPS
Versions (Add all that apply)
The text was updated successfully, but these errors were encountered: