Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Added and_scalar and or_scalar for boolean #707

Conversation

silathdiir
Copy link
Contributor

Close #663

@codecov
Copy link

codecov bot commented Dec 24, 2021

Codecov Report

Merging #707 (ad36ece) into main (f64339c) will increase coverage by 0.41%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #707      +/-   ##
==========================================
+ Coverage   69.77%   70.19%   +0.41%     
==========================================
  Files         303      309       +6     
  Lines       16855    16815      -40     
==========================================
+ Hits        11761    11803      +42     
+ Misses       5094     5012      -82     
Impacted Files Coverage Δ
src/compute/boolean.rs 95.34% <100.00%> (+2.01%) ⬆️
src/io/parquet/read/primitive/utils.rs 72.72% <0.00%> (-18.19%) ⬇️
src/io/avro/read/header.rs 84.61% <0.00%> (-15.39%) ⬇️
src/io/avro/read/block.rs 59.45% <0.00%> (-11.97%) ⬇️
src/datatypes/schema.rs 32.00% <0.00%> (-5.94%) ⬇️
src/io/parquet/read/mod.rs 38.84% <0.00%> (-5.92%) ⬇️
src/io/json/read/deserialize.rs 68.42% <0.00%> (-5.46%) ⬇️
src/io/avro/read/deserialize.rs 67.22% <0.00%> (-5.43%) ⬇️
src/io/avro/read_async/block.rs 78.12% <0.00%> (-4.64%) ⬇️
src/io/avro/read/mod.rs 80.00% <0.00%> (-1.25%) ⬇️
... and 88 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f64339c...ad36ece. Read the comment docs.

Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this! The amount of testing that is here is really impressive.

I think that there is an optimization here:

  • When the scalar is true, A & true = A and A | true = A.
  • When the scalar is false, A & false = false and A | false = A

i.e. we do not need to compute this item by item; we just need to clone and/or initialize all items as false / unset accordingly. This should be orders of magnitude faster since it is not O(N).

What do you think?

@silathdiir
Copy link
Contributor Author

I think that there is an optimization here:

  • When the scalar is true, A & true = A and A | true = A.
  • When the scalar is false, A & false = false and A | false = A

i.e. we do not need to compute this item by item; we just need to clone and/or initialize all items as false / unset accordingly. This should be orders of magnitude faster since it is not O(N).

What do you think?

Yes. That should be great. I will update the code. Thanks.

Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great one. Left two minor suggestions :)

src/compute/boolean.rs Outdated Show resolved Hide resolved
pub fn or_scalar(array: &BooleanArray, scalar: &BooleanScalar) -> BooleanArray {
match scalar.value() {
Some(true) => {
let values = Bitmap::from_trusted_len_iter(std::iter::repeat(true).take(array.len()));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let values = Bitmap::from_trusted_len_iter(std::iter::repeat(true).take(array.len()));
let values = Bitmap::from_len_zeroed(array.len());

Copy link
Contributor Author

@silathdiir silathdiir Dec 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this or version, I want to initialize a Bitmap with all true bits to implement A | true = true. Is that right? Thanks.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, you are right!

Unfortunately we do not have a specific method for that, but it is worth checking MutableBitmap::extend_constant here, since it sets 8 bits per write, still much faster than from_trusted_len_iter ^_^

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, the reason we have all these seemly similar APIs is that for a single API (like rust's Vec::extend) we need trait specialization in the stable channel. This is why we currently name every method differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for instruction. I updated to use MutableBitmap:: extend_constant. Thanks.

@jorgecarleitao jorgecarleitao merged commit 64043a0 into jorgecarleitao:main Dec 25, 2021
@jorgecarleitao jorgecarleitao added the feature A new feature label Dec 25, 2021
@jorgecarleitao jorgecarleitao changed the title Add and_scalar and or_scalar for boolean Added and_scalar and or_scalar for boolean Dec 25, 2021
@jorgecarleitao
Copy link
Owner

Merged. Thanks a lot, @silathdiir!

@silathdiir silathdiir deleted the add-and-scalar-and-or-scalar-to-boolean branch December 25, 2021 09:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature A new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add and_scalar and or_scalar for boolean
2 participants