Benchmark and optimize PyArray2/3::from_vec2/3 #292

adamreichold · 2022-03-12T12:22:30Z

Follow-up to #291. Admittedly, performance conscious code will probably try to avoid these methods, but it is relatively straight-forward to optimize them, as using pointer traversal instead of indexing brings a large improvement

 name              main ns/iter  indexing ns/iter  diff ns/iter   diff %  speedup 
 from_vec2_large   435,855       10,061                -425,794  -97.69%  x 43.32 
 from_vec2_medium  6,699         507                     -6,192  -92.43%  x 13.21 
 from_vec2_small   702           325                       -377  -53.70%   x 2.16 
 from_vec3_large   303,579       7,109                 -296,470  -97.66%  x 42.70 
 from_vec3_medium  38,732        3,000                  -35,732  -92.25%  x 12.91 
 from_vec3_small   950           410                       -540  -56.84%   x 2.32

and optimizing for T::IS_COPY elements is then a bit of a mixed bag on top of this

 name              indexing ns/iter  copying ns/iter  diff ns/iter   diff %  speedup 
 from_vec2_large   10,061            10,265                    204    2.03%   x 0.98 
 from_vec2_medium  507               527                        20    3.94%   x 0.96 
 from_vec2_small   325               317                        -8   -2.46%   x 1.03 
 from_vec3_large   7,109             7,084                     -25   -0.35%   x 1.00 
 from_vec3_medium  3,000             1,735                  -1,265  -42.17%   x 1.73 
 from_vec3_small   410               404                        -6   -1.46%   x 1.01

I suspect that this is due to the innermost slices being too short for the overhead of calling ptr::copy_nonoverlapping to amortize itself, but I think we can still keep as it not making things significantly worse and might help in specific cases of large rows and should not add code bloat due to T::IS_COPY being resolved at compile time.

adamreichold · 2022-03-12T15:07:25Z

Not doing an upfront pass to check for ragged arrays brings another single digit percentage win:

 name              copying ns/iter  single-pass ns/iter  diff ns/iter   diff %  speedup 
 from_vec2_large   10,265           9,897                        -368   -3.58%   x 1.04 
 from_vec2_medium  527              501                           -26   -4.93%   x 1.05 
 from_vec2_small   317              314                            -3   -0.95%   x 1.01 
 from_vec3_large   7,084            6,943                        -141   -1.99%   x 1.02 
 from_vec3_medium  1,735            1,548                        -187  -10.78%   x 1.12 
 from_vec3_small   404              395                            -9   -2.23%   x 1.02

davidhewitt

This plus #291 is very cool - always nice to deliver performance improvements! Just a few random thoughts.

src/array.rs

davidhewitt · 2022-03-17T13:46:28Z

src/array.rs

-            assert!(idx == len);
-            array
-        }
+        let data = iter.collect::<Box<[_]>>();


It looks like the implementation for FromIterator for boxed slices just goes through Vec anyway, so do we really need the from_exact_iter variant?

https://doc.rust-lang.org/src/alloc/boxed.rs.html#1880-1884

My initial aim was to avoid changing the API, but since from_*iter is probably not the most used API and we are still at version 0.x, this does not really seem to be appropriate.

As for using Box<[_]> instead of Vec<_>, I was thinking that the exact case will not end up with excess capacity and hence could go from vec to boxed slice without re-allocation. But this is not really useful as a boxed slice and a vec without excess capacity will basically end up creating the same PySliceContainer instance.

Hence, I'll add a commit removing the from_exact_iter method as any specializations that the standard library provides with pass through the generic signature of from_iter in any case.

@davidhewitt Please let me know if changing API is alright without before I merge. Thanks!

src/array.rs

davidhewitt

Happy with this as-is, though I do prefer deprecating where possible myself.

src/array.rs

…array for from_vec2/3.

…::to_pyarray.

…se it in PyArray::from_slice/vec2/vec3.

… on top of PyArray::from_iter.

adamreichold force-pushed the copy-from-vec23 branch from 65e4654 to 7bd0630 Compare March 12, 2022 13:49

adamreichold force-pushed the copy-from-vec23 branch 4 times, most recently from 6664de8 to d384132 Compare March 17, 2022 11:26

davidhewitt approved these changes Mar 17, 2022

View reviewed changes

adamreichold force-pushed the copy-from-vec23 branch 4 times, most recently from d273589 to 4c89dee Compare March 17, 2022 18:46

davidhewitt approved these changes Mar 18, 2022

View reviewed changes

src/array.rs Show resolved Hide resolved

adamreichold added 9 commits March 18, 2022 11:53

Add benchmarks for PyArray2/3::from_vec2/3.

2c16606

Avoid the repeated index computations when copying into a contiguous …

2d042cb

…array for from_vec2/3.

Copy the contiguous rows if the element type allows it in from_vec2/3.

b6c86d0

Avoid enumerating the iterators in PyArray1::from_slice and ArrayBase…

a8815fd

…::to_pyarray.

Do not pessimize from_vec2/3 towards rejecting ragged arrays.

6864a8a

Factor common code for clone elements into a given data pointer and u…

0a6c3a4

…se it in PyArray::from_slice/vec2/vec3.

Tweak code generation of PyArray::from_vec(2/3).

a80e9c7

Deprecate PyArray::from_exact_iter as it does not really add anything…

53df004

… on top of PyArray::from_iter.

Recover lost coverage.

0b3a83e

adamreichold force-pushed the copy-from-vec23 branch from 4c89dee to 0b3a83e Compare March 18, 2022 10:53

adamreichold merged commit 833896d into main Mar 18, 2022

adamreichold deleted the copy-from-vec23 branch March 18, 2022 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark and optimize PyArray2/3::from_vec2/3 #292

Benchmark and optimize PyArray2/3::from_vec2/3 #292

adamreichold commented Mar 12, 2022 •

edited

Loading

adamreichold commented Mar 12, 2022

davidhewitt left a comment

davidhewitt Mar 17, 2022

adamreichold Mar 17, 2022 •

edited

Loading

adamreichold Mar 17, 2022 •

edited

Loading

davidhewitt left a comment

Benchmark and optimize PyArray2/3::from_vec2/3 #292

Benchmark and optimize PyArray2/3::from_vec2/3 #292

Conversation

adamreichold commented Mar 12, 2022 • edited Loading

adamreichold commented Mar 12, 2022

davidhewitt left a comment

Choose a reason for hiding this comment

davidhewitt Mar 17, 2022

Choose a reason for hiding this comment

adamreichold Mar 17, 2022 • edited Loading

Choose a reason for hiding this comment

adamreichold Mar 17, 2022 • edited Loading

Choose a reason for hiding this comment

davidhewitt left a comment

Choose a reason for hiding this comment

adamreichold commented Mar 12, 2022 •

edited

Loading

adamreichold Mar 17, 2022 •

edited

Loading

adamreichold Mar 17, 2022 •

edited

Loading