-
Notifications
You must be signed in to change notification settings - Fork 224
delay null_counts #994
delay null_counts #994
Conversation
src/array/utf8/mutable.rs
Outdated
@@ -36,12 +37,24 @@ impl<O: Offset> From<MutableUtf8Array<O>> for Utf8Array<O> { | |||
// Safety: | |||
// `MutableUtf8Array` has the same invariants as `Utf8Array` and thus | |||
// `Utf8Array` can be safely created from `MutableUtf8Array` without checks. | |||
let validity = other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If null_count == 0
we don't need to set the validity
@@ -368,10 +381,6 @@ impl<O: Offset> MutableUtf8Array<O> { | |||
self.validity.as_mut().unwrap(), | |||
iterator, | |||
); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extend might be called a lot, which means we have quadratic behavior on the null_count
compute.
Codecov Report
@@ Coverage Diff @@
## main #994 +/- ##
==========================================
+ Coverage 71.41% 71.42% +0.01%
==========================================
Files 357 357
Lines 19801 19799 -2
==========================================
+ Hits 14140 14141 +1
+ Misses 5661 5658 -3
Continue to review full report at Codecov.
|
@jorgecarleitao I added an extra test and that required met to clone the Was there any reason it was not deriving clone other than just not yet added? |
Awesome, @ritchie46 ! No reason - at the time the |
On several
extend
(which may be called in hot loops) there were stillnull_count
check on mutable arrays.Upon freezing a
MutableArray
toArray
there was also this patternI rewrote it by first doing the
bitmap
coversions (which computes the nulls). And then if thenull_count == 0
then we setvalidity = None
.So basically this PR delays all
null_counts
until we freeze the array as we will compute it upon conversion and can then decide if we can drop the validity or not.