-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel read performance degradation #72
Comments
Re-imported a more organized data(import CSV one by one, it organized by Year). DatabendQuery v-0.1.0-eb581cd-simd(1.60.0-nightly-2022-02-27T09:20:55.414288628+00:00)fuse_history snapshot
Explain a query(partitions_scanned: 61, partitions_total: 203)
T0:
|
These may be useful to you, or may wish me to provide other information, @dantengsky @Xuanwo |
I'm working on addressing this issue now. |
After testing, the new version(v0.1.3) does not address this issue, we still need to locate it. |
We have a On my local benchmark with s3 over minio, it seems that opendal works well with parallel: s3_parallel/parallel_range_read_1
time: [2.6362 ms 2.8162 ms 3.0113 ms]
thrpt: [2.5944 GiB/s 2.7741 GiB/s 2.9635 GiB/s]
s3_parallel/parallel_range_read_2
time: [3.3120 ms 3.3710 ms 3.4387 ms]
thrpt: [4.5439 GiB/s 4.6351 GiB/s 4.7177 GiB/s]
s3_parallel/parallel_range_read_4
time: [4.5608 ms 4.6776 ms 4.7988 ms]
thrpt: [6.5120 GiB/s 6.6808 GiB/s 6.8518 GiB/s]
s3_parallel/parallel_range_read_8
time: [8.9568 ms 9.3281 ms 9.6966 ms]
thrpt: [6.4455 GiB/s 6.7002 GiB/s 6.9779 GiB/s]
s3_parallel/parallel_range_read_16
time: [19.478 ms 20.004 ms 20.549 ms]
thrpt: [6.0831 GiB/s 6.2487 GiB/s 6.4173 GiB/s] We set runtime threads to Benchmark on fs_parallel/parallel_range_read_1
time: [1.4370 ms 1.4634 ms 1.5090 ms]
thrpt: [5.1771 GiB/s 5.3386 GiB/s 5.4365 GiB/s]
fs_parallel/parallel_range_read_2
time: [1.3994 ms 1.4775 ms 1.5524 ms]
thrpt: [10.065 GiB/s 10.575 GiB/s 11.166 GiB/s]
fs_parallel/parallel_range_read_4
time: [3.4201 ms 3.4835 ms 3.5609 ms]
thrpt: [8.7758 GiB/s 8.9708 GiB/s 9.1372 GiB/s]
fs_parallel/parallel_range_read_8
time: [9.9594 ms 10.172 ms 10.378 ms]
thrpt: [6.0225 GiB/s 6.1443 GiB/s 6.2755 GiB/s]
fs_parallel/parallel_range_read_16
time: [22.447 ms 22.953 ms 23.471 ms]
thrpt: [5.3257 GiB/s 5.4460 GiB/s 5.5688 GiB/s] |
Good, let's close. |
Summary
This is found from databend ontime dataset tests on AWS EC2 and S3: How to
parallel_read_threads
is 1, if we setting it to 4:All the query cost almost same as the no setting.
How to check:
if we checkout to
f9971bdf335333ffc2253b60b0842b2a3c8ca6cc
commit:Then set
parallel_read_threads=4
will have a lot performance improve.The text was updated successfully, but these errors were encountered: