Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid an Arc::clone per row in benchmark #1975

Merged
merged 1 commit into from
Mar 11, 2022

Conversation

jhorstmann
Copy link
Contributor

@jhorstmann jhorstmann commented Mar 10, 2022

Which issue does this PR close?

Closes #1973.

Rationale for this change

Slightly improves the performance of writing rows.

What changes are included in this PR?

To avoid cloning the SchemaRef we pass in the schema as a separate parameter. I also marked the benchmark functions as inline(never) so that they stand out more in the profiler, since they are operating on large chunks of data this should not create any overhead.

Benchmark results on i7-10510U, run with $ RUSTFLAGS="-C target-cpu=skylake" cargo bench --features row,jit --bench jit:

master branch:

row serializer          time:   [2.0518 s 2.0745 s 2.1029 s]                              
row serializer jit      time:   [1.8530 s 1.8626 s 1.8723 s]                                  

this branch:

row serializer          time:   [1.6923 s 1.7042 s 1.7161 s]                              
row serializer jit      time:   [1.8468 s 1.8562 s 1.8657 s]                                  

If I understand the code correctly then the jit calls the same write_field_xyz functions as the rust version and is not able to inline these functions. So it avoids the type dispatch, but instead has several more function calls than the rust code (which is able to inline some of the write_field functions). It should be possible to speed up the jit a lot if it could directly generate code corresponding to the write_field methods that could get inlined and also avoid the downcasting.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Mar 10, 2022
Copy link
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remarkable findings and analysis! So I suppose our next step is to optimize the performance of the JIT path further?

@yjshen
Copy link
Member

yjshen commented Mar 11, 2022

Cc @alamb @houqp You may also be interested in this.

@houqp
Copy link
Member

houqp commented Mar 11, 2022

Good catch 👍

@houqp houqp added the performance Make DataFusion faster label Mar 11, 2022
@houqp houqp merged commit a6a1bc9 into apache:master Mar 11, 2022
@yjshen
Copy link
Member

yjshen commented Mar 15, 2022

After searching and discussing with @houqp, it seems complicated to make cranelift to inline rust function into JIT code. I want to try LLVM out with both assembly and IR inline capabilities. I will report here if I make some progress.

Quote Postgres JIT docs here:

One big advantage of JITing expressions is that it can significantly
reduce the overhead of PostgreSQL's extensible function/operator
mechanism, by inlining the body of called functions/operators.

It obviously is undesirable to maintain a second implementation of
commonly used functions, just for inlining purposes. Instead we take
advantage of the fact that the Clang compiler can emit LLVM IR.

The ability to do so allows us to get the LLVM IR for all operators
(e.g. int8eq, float8pl etc), without maintaining two copies. These
bitcode files get installed into the server's
$pkglibdir/bitcode/postgres/
Using existing LLVM functionality (for parallel LTO compilation),
additionally an index is over these is stored to
$pkglibdir/bitcode/postgres.index.bc

https://github.com/postgres/postgres/blob/7e12256b478b89518ff410f29192af21de37d070/src/backend/jit/README#L192-L219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate performance Make DataFusion faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unnecessary Arc::clone per row in RowWriter benchmark
4 participants