Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

delay null_counts #994

Merged
merged 5 commits into from
May 19, 2022
Merged

Conversation

ritchie46
Copy link
Collaborator

On several extend (which may be called in hot loops) there were still null_count check on mutable arrays.

Upon freezing a MutableArray to Array there was also this pattern

// first null_count
if mutable_bitmap::null_count > 0 {
  // second null_count on conversion to bitmap
  mutable_bitmap::into_bitmap
}

I rewrote it by first doing the bitmap coversions (which computes the nulls). And then if the null_count == 0 then we set validity = None.

So basically this PR delays all null_counts until we freeze the array as we will compute it upon conversion and can then decide if we can drop the validity or not.

@@ -36,12 +37,24 @@ impl<O: Offset> From<MutableUtf8Array<O>> for Utf8Array<O> {
// Safety:
// `MutableUtf8Array` has the same invariants as `Utf8Array` and thus
// `Utf8Array` can be safely created from `MutableUtf8Array` without checks.
let validity = other
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If null_count == 0 we don't need to set the validity

@@ -368,10 +381,6 @@ impl<O: Offset> MutableUtf8Array<O> {
self.validity.as_mut().unwrap(),
iterator,
);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extend might be called a lot, which means we have quadratic behavior on the null_count compute.

@codecov
Copy link

codecov bot commented May 18, 2022

Codecov Report

Merging #994 (f7c3daf) into main (aafba7b) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #994      +/-   ##
==========================================
+ Coverage   71.41%   71.42%   +0.01%     
==========================================
  Files         357      357              
  Lines       19801    19799       -2     
==========================================
+ Hits        14140    14141       +1     
+ Misses       5661     5658       -3     
Impacted Files Coverage Δ
src/bitmap/mutable.rs 90.25% <ø> (ø)
src/array/primitive/mutable.rs 89.40% <100.00%> (+0.61%) ⬆️
src/array/utf8/mutable.rs 87.23% <100.00%> (+0.20%) ⬆️
src/compute/arithmetics/time.rs 26.60% <0.00%> (+0.91%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aafba7b...f7c3daf. Read the comment docs.

@ritchie46
Copy link
Collaborator Author

@jorgecarleitao I added an extra test and that required met to clone the MutableArray. I derived Clone for MutablePrimitiveArray and MutableBitmap as that seemed a reasonable trait to have to me.

Was there any reason it was not deriving clone other than just not yet added?

@jorgecarleitao jorgecarleitao merged commit 267cac0 into jorgecarleitao:main May 19, 2022
@jorgecarleitao
Copy link
Owner

Awesome, @ritchie46 !

No reason - at the time the MutableBuffer was not Clone and thus none was clone - meanwhile we moved to Vec which is clone and thus everything can be clone ^^

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants