Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark and optimize PyArray2/3::from_vec2/3 #292

Merged
merged 9 commits into from
Mar 18, 2022
Merged

Conversation

adamreichold
Copy link
Member

@adamreichold adamreichold commented Mar 12, 2022

Follow-up to #291. Admittedly, performance conscious code will probably try to avoid these methods, but it is relatively straight-forward to optimize them, as using pointer traversal instead of indexing brings a large improvement

 name              main ns/iter  indexing ns/iter  diff ns/iter   diff %  speedup 
 from_vec2_large   435,855       10,061                -425,794  -97.69%  x 43.32 
 from_vec2_medium  6,699         507                     -6,192  -92.43%  x 13.21 
 from_vec2_small   702           325                       -377  -53.70%   x 2.16 
 from_vec3_large   303,579       7,109                 -296,470  -97.66%  x 42.70 
 from_vec3_medium  38,732        3,000                  -35,732  -92.25%  x 12.91 
 from_vec3_small   950           410                       -540  -56.84%   x 2.32 

and optimizing for T::IS_COPY elements is then a bit of a mixed bag on top of this

 name              indexing ns/iter  copying ns/iter  diff ns/iter   diff %  speedup 
 from_vec2_large   10,061            10,265                    204    2.03%   x 0.98 
 from_vec2_medium  507               527                        20    3.94%   x 0.96 
 from_vec2_small   325               317                        -8   -2.46%   x 1.03 
 from_vec3_large   7,109             7,084                     -25   -0.35%   x 1.00 
 from_vec3_medium  3,000             1,735                  -1,265  -42.17%   x 1.73 
 from_vec3_small   410               404                        -6   -1.46%   x 1.01 

I suspect that this is due to the innermost slices being too short for the overhead of calling ptr::copy_nonoverlapping to amortize itself, but I think we can still keep as it not making things significantly worse and might help in specific cases of large rows and should not add code bloat due to T::IS_COPY being resolved at compile time.

@adamreichold
Copy link
Member Author

Not doing an upfront pass to check for ragged arrays brings another single digit percentage win:

 name              copying ns/iter  single-pass ns/iter  diff ns/iter   diff %  speedup 
 from_vec2_large   10,265           9,897                        -368   -3.58%   x 1.04 
 from_vec2_medium  527              501                           -26   -4.93%   x 1.05 
 from_vec2_small   317              314                            -3   -0.95%   x 1.01 
 from_vec3_large   7,084            6,943                        -141   -1.99%   x 1.02 
 from_vec3_medium  1,735            1,548                        -187  -10.78%   x 1.12 
 from_vec3_small   404              395                            -9   -2.23%   x 1.02 

@adamreichold adamreichold force-pushed the copy-from-vec23 branch 4 times, most recently from 6664de8 to d384132 Compare March 17, 2022 11:26
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This plus #291 is very cool - always nice to deliver performance improvements! Just a few random thoughts.

src/array.rs Show resolved Hide resolved
src/array.rs Outdated
assert!(idx == len);
array
}
let data = iter.collect::<Box<[_]>>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the implementation for FromIterator for boxed slices just goes through Vec anyway, so do we really need the from_exact_iter variant?

https://doc.rust-lang.org/src/alloc/boxed.rs.html#1880-1884

Copy link
Member Author

@adamreichold adamreichold Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial aim was to avoid changing the API, but since from_*iter is probably not the most used API and we are still at version 0.x, this does not really seem to be appropriate.

As for using Box<[_]> instead of Vec<_>, I was thinking that the exact case will not end up with excess capacity and hence could go from vec to boxed slice without re-allocation. But this is not really useful as a boxed slice and a vec without excess capacity will basically end up creating the same PySliceContainer instance.

Hence, I'll add a commit removing the from_exact_iter method as any specializations that the standard library provides with pass through the generic signature of from_iter in any case.

Copy link
Member Author

@adamreichold adamreichold Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidhewitt Please let me know if changing API is alright without before I merge. Thanks!

src/array.rs Outdated Show resolved Hide resolved
@adamreichold adamreichold force-pushed the copy-from-vec23 branch 4 times, most recently from d273589 to 4c89dee Compare March 17, 2022 18:46
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with this as-is, though I do prefer deprecating where possible myself.

src/array.rs Show resolved Hide resolved
@adamreichold adamreichold merged commit 833896d into main Mar 18, 2022
@adamreichold adamreichold deleted the copy-from-vec23 branch March 18, 2022 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants