Avoid an Arc::clone per row in benchmark #1975

jhorstmann · 2022-03-10T10:53:59Z

Which issue does this PR close?

Closes #1973.

Rationale for this change

Slightly improves the performance of writing rows.

What changes are included in this PR?

To avoid cloning the SchemaRef we pass in the schema as a separate parameter. I also marked the benchmark functions as inline(never) so that they stand out more in the profiler, since they are operating on large chunks of data this should not create any overhead.

Benchmark results on i7-10510U, run with $ RUSTFLAGS="-C target-cpu=skylake" cargo bench --features row,jit --bench jit:

master branch:

row serializer          time:   [2.0518 s 2.0745 s 2.1029 s]                              
row serializer jit      time:   [1.8530 s 1.8626 s 1.8723 s]

this branch:

row serializer          time:   [1.6923 s 1.7042 s 1.7161 s]                              
row serializer jit      time:   [1.8468 s 1.8562 s 1.8657 s]

If I understand the code correctly then the jit calls the same write_field_xyz functions as the rust version and is not able to inline these functions. So it avoids the type dispatch, but instead has several more function calls than the rust code (which is able to inline some of the write_field functions). It should be possible to speed up the jit a lot if it could directly generate code corresponding to the write_field methods that could get inlined and also avoid the downcasting.

yjshen

Remarkable findings and analysis! So I suppose our next step is to optimize the performance of the JIT path further?

yjshen · 2022-03-11T03:02:09Z

Cc @alamb @houqp You may also be interested in this.

houqp · 2022-03-11T05:31:28Z

Good catch 👍

yjshen · 2022-03-15T01:59:29Z

After searching and discussing with @houqp, it seems complicated to make cranelift to inline rust function into JIT code. I want to try LLVM out with both assembly and IR inline capabilities. I will report here if I make some progress.

Quote Postgres JIT docs here:

One big advantage of JITing expressions is that it can significantly
reduce the overhead of PostgreSQL's extensible function/operator
mechanism, by inlining the body of called functions/operators.

It obviously is undesirable to maintain a second implementation of
commonly used functions, just for inlining purposes. Instead we take
advantage of the fact that the Clang compiler can emit LLVM IR.

The ability to do so allows us to get the LLVM IR for all operators
(e.g. int8eq, float8pl etc), without maintaining two copies. These
bitcode files get installed into the server's
$pkglibdir/bitcode/postgres/
Using existing LLVM functionality (for parallel LTO compilation),
additionally an index is over these is stored to
$pkglibdir/bitcode/postgres.index.bc

https://github.com/postgres/postgres/blob/7e12256b478b89518ff410f29192af21de37d070/src/backend/jit/README#L192-L219

Avoid an Arc::clone per row in benchmark

583443a

github-actions bot added the datafusion Changes in the datafusion crate label Mar 10, 2022

yjshen approved these changes Mar 10, 2022

View reviewed changes

andygrove approved these changes Mar 10, 2022

View reviewed changes

houqp added the performance Make DataFusion faster label Mar 11, 2022

houqp merged commit a6a1bc9 into apache:master Mar 11, 2022

yjshen mentioned this pull request Mar 30, 2022

JIT-compille DataFusion expression with column name #2124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid an Arc::clone per row in benchmark #1975

Avoid an Arc::clone per row in benchmark #1975

jhorstmann commented Mar 10, 2022 •

edited

Loading

yjshen left a comment

yjshen commented Mar 11, 2022

houqp commented Mar 11, 2022

yjshen commented Mar 15, 2022

Avoid an Arc::clone per row in benchmark #1975

Avoid an Arc::clone per row in benchmark #1975

Conversation

jhorstmann commented Mar 10, 2022 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

yjshen left a comment

Choose a reason for hiding this comment

yjshen commented Mar 11, 2022

houqp commented Mar 11, 2022

yjshen commented Mar 15, 2022

jhorstmann commented Mar 10, 2022 •

edited

Loading