Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: RocksDB "busy" exceptions #3719

Closed
garyschulte opened this issue Apr 11, 2022 · 0 comments · Fixed by #3720
Closed

bug: RocksDB "busy" exceptions #3719

garyschulte opened this issue Apr 11, 2022 · 0 comments · Fixed by #3720
Assignees
Labels

Comments

@garyschulte
Copy link
Contributor

garyschulte commented Apr 11, 2022

Description

In a variety of instances Besu will encounter a rocksdb "busy" exception. This may be exacerbated by the use of OptimisticTransactionDB. The 'busy' errors are especially pronounced on mainnet nodes during fast-sync.
e.g.

{
   "timestamp":"2022-04-07T15:42:19,838",
   "level":"ERROR",
   "thread":"EthScheduler-Services-7 (batchPersistData)",
   "class":"FastWorldStateDownloadProcess",
   "message":"Pipeline failed",
   "throwable":"
        org.hyperledger.besu.plugin.services.exception.StorageException: org.rocksdb.RocksDBException: Busy
            at org.hyperledger.besu.plugin.services.storage.rocksdb.segmented.RocksDBColumnarKeyValueStorage$RocksDbTransaction.commit(RocksDBColumnarKeyValueStorage.java:287)
            at org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageTransactionTransitionValidatorDecorator.commit(SegmentedKeyValueStorageTransactionTransitionValidatorDecorator.java:49)
            at org.hyperledger.besu.services.kvstore.SegmentedKeyValueStorageAdapter$1.commit(SegmentedKeyValueStorageAdapter.java:90)
            at org.hyperledger.besu.ethereum.bonsai.BonsaiWorldStateKeyValueStorage$Updater.commit(BonsaiWorldStateKeyValueStorage.java:333)
            at org.hyperledger.besu.ethereum.eth.sync.fastsync.worldstate.PersistDataStep.persist(PersistDataStep.java:53)
            at org.hyperledger.besu.ethereum.eth.sync.fastsync.worldstate.FastWorldStateDownloadProcess$Builder.lambda$build$3(FastWorldStateDownloadProcess.java:202)
            at org.hyperledger.besu.services.pipeline.MapProcessor.processNextInput(MapProcessor.java:31)
            at org.hyperledger.besu.services.pipeline.ProcessingStage.run(ProcessingStage.java:38)
            at org.hyperledger.besu.services.pipeline.Pipeline.lambda$runWithErrorHandling$3(Pipeline.java:152)
            at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
            at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            at java.base/java.lang.Thread.run(Thread.java:829)
        Caused by: org.rocksdb.RocksDBException: Busy
            at org.rocksdb.Transaction.commit(Native Method)
            at org.rocksdb.Transaction.commit(Transaction.java:206)
            at org.hyperledger.besu.plugin.services.storage.rocksdb.segmented.RocksDBColumnarKeyValueStorage$RocksDbTransaction.commit(RocksDBColumnarKeyValueStorage.java:281)
            ... 13 more"
}

We should either handle these exceptions and implement a retry strategy, or revert to TransactionDB

Acceptance Criteria

  • Besu should handle storage exceptions gracefully and retry where possible

Steps to Reproduce (Bug)

  1. the easiest way to reproduce is to simply fast-sync mainnet on bonsai

Expected behavior: [What you expect to happen]
fast-sync completes

Actual behavior: [What actually happens]
Unhandled "Busy" exeptions cause the world state downloader to abend, but the besu process stays up and continues to download blocks. When sync completes the worldstate is incomplete.

Frequency: [What percentage of the time does it occur?]
~100% on ec2 instances which have 6000 or fewer IOPS

Versions (Add all that apply)

  • Software version: 22.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant