Faster way to get block with prevouts in JSON-RPC #30495

vostrnad · 2024-07-21T22:48:36Z

I often need to process the whole blockchain (or a large part of it) using an external script/program, for which I need blocks with prevout information included. However, the only current way to get that is getblock <hash> 3, which includes a lot of potentially unnecessary data and is quite slow, mainly (based on my experiments) because of UniValue overhead and descriptor inferring.

I benchmarked current master, retrieving 1000 blocks sequentially starting at block 840000, with different verbosity parameters:

benchmark	result
getblock (verbosity=0)	16.189s ± 1.165s
getblock (verbosity=1)	31.975s ± 1.014s
getblock (verbosity=2)	352.487s ± 1.636s
getblock (verbosity=3)	473.375s ± 2.280s

As you can see, verbosity=3 is around 30 times slower than verbosity=0. It seems obvious that a faster way of getting blocks with prevout information is feasible.

Potential solutions that come to mind:

Creating a new RPC call for undo data, say getblockundo. This would be perfect for my needs, but it would require making the undo data serialization format non-internal (not sure if this would be a problem, as IIRC it hasn't changed in many years).
Creating a new verbosity level for getblock that would only provide the minimum amount of data necessary (i.e. no addresses, descriptors, ASM scripts, TXIDs/WTXIDs etc.) while still providing prevouts. This would be better than nothing but would still leave a lot of performance on the table because of UniValue overhead.

The text was updated successfully, but these errors were encountered:

andrewtoth · 2024-07-30T23:16:18Z

There are a few strategies to speed this up on the client side instead:

Fetch blocks concurrently
Fetch blocks in parallel
Fetch blocks in batch requests
A combination of all of the above

Setting rpcthreads to a higher number than the default 4 will allow you to request more concurrently or in parallel as well.

maflcko · 2024-08-08T11:02:55Z

#30595 mentions "Traversing the block index as well and using block index entries for reading block and undo data." However, it does not return JSON-RPC, but a kernel_BlockUndo*/BlockUndo, also the pull is experimental, doesn't have versioning and has some other drawbacks. (Just mentioning it for context, because if you care about speed, this may be faster than JSON)

ismaelsadeeq · 2024-10-29T17:11:09Z

I also noticed using getblock sequentially on a large number of blocks was slow while checking for clusters of size > 2 in previously mined blocks see #30079 (comment).

To investigate further, I conducted a benchmark on a VPS with specs:

8 vCPU Cores, 24 GB RAM, 1.2 TB SSD, 32 TB Traffic
Running Ubuntu 22 with Bitcoin Core on latest master da10e0b

I used a script to retrieve 1000 blocks starting at block 840000, testing:

Verbosity levels 1, 2, and 3
Using Sequential and then Thread Pool strategies as @andrewtoth hinted
Running 3 iterations

Benchmark Results

Verbosity 1

Strategy	Iteration 1	Iteration 2	Iteration 3	Mean	Standard Deviation
Sequential	202 sec	118 sec	119 sec	146 sec	39 sec
Thread Pool	51 sec	52 sec	54 sec	53 sec	1 sec

Verbosity 2

Strategy	Iteration 1	Iteration 2	Iteration 3	Mean	Standard Deviation
Sequential	5004 sec	3517 sec	4952 sec	4491 sec	689 sec
Thread Pool	1248 sec	1289 sec	1298 sec	1279 sec	22 sec

Verbosity 3

Strategy	Iteration 1	Iteration 2	Iteration 3	Mean	Standard Deviation
Sequential	4145 sec	4175 sec	4187 sec	4169 sec	18 sec
Thread Pool	1591 sec	1564 sec	1587 sec	1581 sec	12 sec

The benchmark results showed a ~27.4% reduction in execution time when using parallel threading, which confirms the potential of client using threading to improve speed.
However, further performance gains would benefit users requiring large block sets for data analysis e.g the whole blockchain.

I reviewed the getblock RPC implementation and noticed that all resources were moved when calling UniValue's pushKV which was nice, also pushKV internally is also moving the values.
In getblock and all the pushes to UniValue that were not moved explicitly were moved implicitly due to type elision.

edit
However, I noticed that space for the block transactions in UniValue was not reserved initially, and appending data individually might likely causing resource reallocation overhead.

Adding a .reserve member function to UniValue can prevent this issue. I added the function and benchmarked to see if there was a performance improvement. The results showed slightly reduced mean times, particularly for verbosity levels 1.

andrewtoth · 2024-10-29T17:22:52Z

@ismaelsadeeq nice find!

I wonder, could you also benchmark batch requests? Sending a single request that contains rpcthread number of getblock requests, both sequentially and multithreaded on the client side?

josibake · 2024-11-04T09:00:58Z

I think there are two separate topics here:

"I need to process the entire blockchain for [an external application like electrs, data analysis, etc]"
We can probably make the JSON-RPC faster, via threading, batching, etc

For 1., @vostrnad have you seen #30595 ? For the specific ask of prevouts, I'm almost certain this will always be faster since the the kernel API provides the prevouts by reading the rev.dat files (admittedly, I haven't looked into how this is done with the getblock rpc, it might also be doing the same).

Here is an example program I wrote using the kernel API via rust bindings: https://github.com/josibake/silent-payments-scanner/blob/74f883c370a26e2eaa5a1a7e8e18643e07ce2cff/src/scanner.rs#L135

I found this very easy to write and incredibly performant. The nice thing about using the kernel API for this is you can use whatever language you want (so long as that language supports C-bindings), and it does not require a running bitcoind process to be able to process the block files, which seems well suited to the data analysis / index building use case.

For experimenting / testing the API, there is https://github.com/theCharlatan/rust-bitcoinkernel, and I've also been meaning to create some python bindings, as well. If this is of interest to you, I'd be happy to explain more and of course would love your feedback on the C API PR.

ismaelsadeeq · 2024-12-20T20:58:29Z

Thank you, @josibake, for highlighting this! I was able to perform some benchmarks to evaluate the performance you claimed of using

Rust wrapper by @TheCharlatan https://github.com/TheCharlatan/rust-bitcoinkernel
Python wrapper by @stickies-v https://github.com/stickies-v/py-bitcoinkernel

As you claimed, this is indeed more performant.

Benchmark Results:

I used the libbitcoinkernel library to imitate extracting block data for the same interval block heights 840000 to 841000, the average execution times are as follows:

Rust bindings: ~87 seconds, tested using https://github.com/ismaelsadeeq/rs-blockparser
Python bindings: ~612 seconds, tested using https://github.com/ismaelsadeeq/py-blockparser

For the Python bindings, I suspect the inefficiency arises from the deserialization process handled by https://github.com/petertodd/python-bitcoinlib because without the deserialization, the execution time drops significantly to around 62 seconds, which is much closer to the Rust result.

The block data returned by these methods is equivalent to the getblock RPC with verbosity level 2.
For verbosity level 3 (undo data alone), you can parse the undo files directly. Given the benchmark results, this approach is most likely faster than using the getblock RPC.

But this approach has some downside I think:

Data Directory Access: The libbitcoinkernel approach requires bitcoind to be shut down, as multiple clients cannot access the datadir simultaneously.
Sequential Process: This is sequential i.e only one parser can run at a time, whereas RPC calls allow asynchronous execution, enabling multiple clients to access the RPC interface simultaneously while bitcoind is still running.

The language bindings (Rust and Python) make it straightforward to build blockchain parsers and other applications, which is a significant advantage. However, this should not deter us from improving the performance of RPC calls, as they remain a widely-used interface for clients. Any chance to optimizing performance like #31490 #31539 #31179 would benefit a broader range of users..

I think this result is convincing enough to close this issue @vostrnad

romanz · 2024-12-21T14:26:53Z

Creating a new RPC call for undo data, say getblockundo. This would be perfect for my needs, but it would require making the undo data serialization format non-internal (not sure if this would be a problem, as IIRC it hasn't changed in many years).

By adding a new REST endpoint for fetching block prevouts, it seems that we can get quite a good throughput rate when reading the data concurrently (tested with ab by fetching a single block 10k times using 4 concurrent connections) in binary format:

$ ab -k -c 4 -n 10000 http://localhost:8332/rest/block/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
...
Document Path:          /rest/block/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
Document Length:        2325617 bytes

Concurrency Level:      4
Time taken for tests:   18.742 seconds
Complete requests:      10000
Failed requests:        0
Keep-Alive requests:    10000
Total transferred:      23257250000 bytes
HTML transferred:       23256170000 bytes
Requests per second:    533.56 [#/sec] (mean)
Time per request:       7.497 [ms] (mean)
Time per request:       1.874 [ms] (mean, across all concurrent requests)
Transfer rate:          1211837.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     4    7   1.6      8      20
Waiting:        2    4   0.9      4      14
Total:          4    7   1.6      8      20

Percentage of the requests served within a certain time (ms)
  50%      8
  66%      8
  75%      9
  80%      9
  90%     10
  95%     10
  98%     11
  99%     11
 100%     20 (longest request)


$ ab -k -c 4 -n 10000 http://localhost:8332/rest/spentoutputs/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
...
Document Path:          /rest/spentoutputs/0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5.bin
Document Length:        151898 bytes

Concurrency Level:      4
Time taken for tests:   4.804 seconds
Complete requests:      10000
Failed requests:        0
Keep-Alive requests:    10000
Total transferred:      1520050000 bytes
HTML transferred:       1518980000 bytes
Requests per second:    2081.80 [#/sec] (mean)
Time per request:       1.921 [ms] (mean)
Time per request:       0.480 [ms] (mean, across all concurrent requests)
Transfer rate:          309027.38 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     2    2   0.3      2      11
Waiting:        2    2   0.2      2      10
Total:          2    2   0.3      2      11

Percentage of the requests served within a certain time (ms)
  50%      2
  66%      2
  75%      2
  80%      2
  90%      2
  95%      2
  98%      2
  99%      3
 100%     11 (longest request)

@vostrnad WDYT?

shivaenigma · 2025-01-13T16:18:06Z

I am also interested in this. Even getrawblock response take 2-5 seconds depending on block size, which is very high. Few low hanging fruits that can be implemented:

Support binding to unix domain sockets
Support Gzip compression

pinheadmz · 2025-01-13T16:19:48Z

Support binding to unix domain sockets

This is a common request, but requires replacing libevent, which I'm working on: #31194

maflcko added RPC/REST/ZMQ Block storage Feature labels Jul 22, 2024

This was referenced Jul 31, 2024

Monthly issue metrics report willcl-ark/bc-issues#1

Open

Monthly issue metrics report willcl-ark/bc-issues#3

Open

ismaelsadeeq mentioned this issue Oct 29, 2024

RPC: Add reserve member function to UniValue and use it in blockToJSON function #31179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster way to get block with prevouts in JSON-RPC #30495

Faster way to get block with prevouts in JSON-RPC #30495

vostrnad commented Jul 21, 2024

andrewtoth commented Jul 30, 2024 •

edited

Loading

maflcko commented Aug 8, 2024

ismaelsadeeq commented Oct 29, 2024 •

edited

Loading

andrewtoth commented Oct 29, 2024 •

edited

Loading

josibake commented Nov 4, 2024

ismaelsadeeq commented Dec 20, 2024

romanz commented Dec 21, 2024

shivaenigma commented Jan 13, 2025 •

edited

Loading

pinheadmz commented Jan 13, 2025

Faster way to get block with prevouts in JSON-RPC #30495

Faster way to get block with prevouts in JSON-RPC #30495

Comments

vostrnad commented Jul 21, 2024

andrewtoth commented Jul 30, 2024 • edited Loading

maflcko commented Aug 8, 2024

ismaelsadeeq commented Oct 29, 2024 • edited Loading

andrewtoth commented Oct 29, 2024 • edited Loading

josibake commented Nov 4, 2024

ismaelsadeeq commented Dec 20, 2024

Benchmark Results:

romanz commented Dec 21, 2024

shivaenigma commented Jan 13, 2025 • edited Loading

pinheadmz commented Jan 13, 2025

andrewtoth commented Jul 30, 2024 •

edited

Loading

ismaelsadeeq commented Oct 29, 2024 •

edited

Loading

andrewtoth commented Oct 29, 2024 •

edited

Loading

shivaenigma commented Jan 13, 2025 •

edited

Loading