Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster way to get block with prevouts in JSON-RPC #30495

Open
vostrnad opened this issue Jul 21, 2024 · 9 comments
Open

Faster way to get block with prevouts in JSON-RPC #30495

vostrnad opened this issue Jul 21, 2024 · 9 comments

Comments

@vostrnad
Copy link

I often need to process the whole blockchain (or a large part of it) using an external script/program, for which I need blocks with prevout information included. However, the only current way to get that is getblock <hash> 3, which includes a lot of potentially unnecessary data and is quite slow, mainly (based on my experiments) because of UniValue overhead and descriptor inferring.

I benchmarked current master, retrieving 1000 blocks sequentially starting at block 840000, with different verbosity parameters:

benchmark result
getblock (verbosity=0) 16.189s ± 1.165s
getblock (verbosity=1) 31.975s ± 1.014s
getblock (verbosity=2) 352.487s ± 1.636s
getblock (verbosity=3) 473.375s ± 2.280s

As you can see, verbosity=3 is around 30 times slower than verbosity=0. It seems obvious that a faster way of getting blocks with prevout information is feasible.

Potential solutions that come to mind:

  • Creating a new RPC call for undo data, say getblockundo. This would be perfect for my needs, but it would require making the undo data serialization format non-internal (not sure if this would be a problem, as IIRC it hasn't changed in many years).
  • Creating a new verbosity level for getblock that would only provide the minimum amount of data necessary (i.e. no addresses, descriptors, ASM scripts, TXIDs/WTXIDs etc.) while still providing prevouts. This would be better than nothing but would still leave a lot of performance on the table because of UniValue overhead.
@andrewtoth
Copy link
Contributor

andrewtoth commented Jul 30, 2024

There are a few strategies to speed this up on the client side instead:

  • Fetch blocks concurrently
  • Fetch blocks in parallel
  • Fetch blocks in batch requests
  • A combination of all of the above

Setting rpcthreads to a higher number than the default 4 will allow you to request more concurrently or in parallel as well.

@maflcko
Copy link
Member

maflcko commented Aug 8, 2024

#30595 mentions "Traversing the block index as well and using block index entries for reading block and undo data." However, it does not return JSON-RPC, but a kernel_BlockUndo*/BlockUndo, also the pull is experimental, doesn't have versioning and has some other drawbacks. (Just mentioning it for context, because if you care about speed, this may be faster than JSON)

@ismaelsadeeq
Copy link
Member

ismaelsadeeq commented Oct 29, 2024

I also noticed using getblock sequentially on a large number of blocks was slow while checking for clusters of size > 2 in previously mined blocks see #30079 (comment).

To investigate further, I conducted a benchmark on a VPS with specs:

  • 8 vCPU Cores, 24 GB RAM, 1.2 TB SSD, 32 TB Traffic
  • Running Ubuntu 22 with Bitcoin Core on latest master da10e0b

I used a script to retrieve 1000 blocks starting at block 840000, testing:

  • Verbosity levels 1, 2, and 3
  • Using Sequential and then Thread Pool strategies as @andrewtoth hinted
  • Running 3 iterations

Benchmark Results

Verbosity 1

Strategy Iteration 1 Iteration 2 Iteration 3 Mean Standard Deviation
Sequential 202 sec 118 sec 119 sec 146 sec 39 sec
Thread Pool 51 sec 52 sec 54 sec 53 sec 1 sec

Verbosity 2

Strategy Iteration 1 Iteration 2 Iteration 3 Mean Standard Deviation
Sequential 5004 sec 3517 sec 4952 sec 4491 sec 689 sec
Thread Pool 1248 sec 1289 sec 1298 sec 1279 sec 22 sec

Verbosity 3

Strategy Iteration 1 Iteration 2 Iteration 3 Mean Standard Deviation
Sequential 4145 sec 4175 sec 4187 sec 4169 sec 18 sec
Thread Pool 1591 sec 1564 sec 1587 sec 1581 sec 12 sec

The benchmark results showed a ~27.4% reduction in execution time when using parallel threading, which confirms the potential of client using threading to improve speed.
However, further performance gains would benefit users requiring large block sets for data analysis e.g the whole blockchain.


I reviewed the getblock RPC implementation and noticed that all resources were moved when calling UniValue's pushKV which was nice, also pushKV internally is also moving the values.
In getblock and all the pushes to UniValue that were not moved explicitly were moved implicitly due to type elision.

edit
However, I noticed that space for the block transactions in UniValue was not reserved initially, and appending data individually might likely causing resource reallocation overhead.

Adding a .reserve member function to UniValue can prevent this issue. I added the function and benchmarked to see if there was a performance improvement. The results showed slightly reduced mean times, particularly for verbosity levels 1.

@andrewtoth
Copy link
Contributor

andrewtoth commented Oct 29, 2024

@ismaelsadeeq nice find!

I wonder, could you also benchmark batch requests? Sending a single request that contains rpcthread number of getblock requests, both sequentially and multithreaded on the client side?

@josibake
Copy link
Member

josibake commented Nov 4, 2024

I think there are two separate topics here:

  1. "I need to process the entire blockchain for [an external application like electrs, data analysis, etc]"
  2. We can probably make the JSON-RPC faster, via threading, batching, etc

For 1., @vostrnad have you seen #30595 ? For the specific ask of prevouts, I'm almost certain this will always be faster since the the kernel API provides the prevouts by reading the rev.dat files (admittedly, I haven't looked into how this is done with the getblock rpc, it might also be doing the same).

Here is an example program I wrote using the kernel API via rust bindings: https://github.com/josibake/silent-payments-scanner/blob/74f883c370a26e2eaa5a1a7e8e18643e07ce2cff/src/scanner.rs#L135

I found this very easy to write and incredibly performant. The nice thing about using the kernel API for this is you can use whatever language you want (so long as that language supports C-bindings), and it does not require a running bitcoind process to be able to process the block files, which seems well suited to the data analysis / index building use case.

For experimenting / testing the API, there is https://github.com/theCharlatan/rust-bitcoinkernel, and I've also been meaning to create some python bindings, as well. If this is of interest to you, I'd be happy to explain more and of course would love your feedback on the C API PR.

@ismaelsadeeq
Copy link
Member

Thank you, @josibake, for highlighting this! I was able to perform some benchmarks to evaluate the performance you claimed of using

As you claimed, this is indeed more performant.

Benchmark Results:

I used the libbitcoinkernel library to imitate extracting block data for the same interval block heights 840000 to 841000, the average execution times are as follows:

  1. Rust bindings: ~87 seconds, tested using https://github.com/ismaelsadeeq/rs-blockparser

  2. Python bindings: ~612 seconds, tested using https://github.com/ismaelsadeeq/py-blockparser

For the Python bindings, I suspect the inefficiency arises from the deserialization process handled by https://github.com/petertodd/python-bitcoinlib because without the deserialization, the execution time drops significantly to around 62 seconds, which is much closer to the Rust result.

The block data returned by these methods is equivalent to the getblock RPC with verbosity level 2.
For verbosity level 3 (undo data alone), you can parse the undo files directly. Given the benchmark results, this approach is most likely faster than using the getblock RPC.

But this approach has some downside I think:

  1. Data Directory Access: The libbitcoinkernel approach requires bitcoind to be shut down, as multiple clients cannot access the datadir simultaneously.
  2. Sequential Process: This is sequential i.e only one parser can run at a time, whereas RPC calls allow asynchronous execution, enabling multiple clients to access the RPC interface simultaneously while bitcoind is still running.

The language bindings (Rust and Python) make it straightforward to build blockchain parsers and other applications, which is a significant advantage. However, this should not deter us from improving the performance of RPC calls, as they remain a widely-used interface for clients. Any chance to optimizing performance like #31490 #31539 #31179 would benefit a broader range of users..

I think this result is convincing enough to close this issue @vostrnad

@romanz
Copy link
Contributor

romanz commented Dec 21, 2024

  • Creating a new RPC call for undo data, say getblockundo. This would be perfect for my needs, but it would require making the undo data serialization format non-internal (not sure if this would be a problem, as IIRC it hasn't changed in many years).

By adding a new REST endpoint for fetching block prevouts, it seems that we can get quite a good throughput rate when reading the data concurrently (tested with ab by fetching a single block 10k times using 4 concurrent connections) in binary format:

$ ab -k -c 4 -n 10000 http://localhost:8332/rest/block/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
...
Document Path:          /rest/block/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
Document Length:        2325617 bytes

Concurrency Level:      4
Time taken for tests:   18.742 seconds
Complete requests:      10000
Failed requests:        0
Keep-Alive requests:    10000
Total transferred:      23257250000 bytes
HTML transferred:       23256170000 bytes
Requests per second:    533.56 [#/sec] (mean)
Time per request:       7.497 [ms] (mean)
Time per request:       1.874 [ms] (mean, across all concurrent requests)
Transfer rate:          1211837.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     4    7   1.6      8      20
Waiting:        2    4   0.9      4      14
Total:          4    7   1.6      8      20

Percentage of the requests served within a certain time (ms)
  50%      8
  66%      8
  75%      9
  80%      9
  90%     10
  95%     10
  98%     11
  99%     11
 100%     20 (longest request)


$ ab -k -c 4 -n 10000 http://localhost:8332/rest/spentoutputs/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
...
Document Path:          /rest/spentoutputs/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
Document Length:        151898 bytes

Concurrency Level:      4
Time taken for tests:   4.804 seconds
Complete requests:      10000
Failed requests:        0
Keep-Alive requests:    10000
Total transferred:      1520050000 bytes
HTML transferred:       1518980000 bytes
Requests per second:    2081.80 [#/sec] (mean)
Time per request:       1.921 [ms] (mean)
Time per request:       0.480 [ms] (mean, across all concurrent requests)
Transfer rate:          309027.38 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     2    2   0.3      2      11
Waiting:        2    2   0.2      2      10
Total:          2    2   0.3      2      11

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      2
  75%      2
  80%      2
  90%      2
  95%      2
  98%      2
  99%      3
 100%     11 (longest request)

@vostrnad WDYT?

@shivaenigma
Copy link

shivaenigma commented Jan 13, 2025

I am also interested in this. Even getrawblock response take 2-5 seconds depending on block size, which is very high. Few low hanging fruits that can be implemented:

  • Support binding to unix domain sockets
  • Support Gzip compression

@pinheadmz
Copy link
Member

Support binding to unix domain sockets

This is a common request, but requires replacing libevent, which I'm working on: #31194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants