Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Fixed statistics writing flag and correct null_count in dictionaries #1414

Merged
merged 2 commits into from
Feb 22, 2023

Conversation

ritchie46
Copy link
Collaborator

This ensures the parquet writer respects the write_statistics==false option. It also corrects the null_count. The reported null count was that of the dictionary values instead of the dictionary array.

@@ -591,7 +591,7 @@ pub fn pyarrow_nullable_statistics(column: &str) -> Statistics {

Statistics {
distinct_count: UInt64Array::from([None]).boxed(),
null_count: UInt64Array::from([Some(0)]).boxed(),
null_count: UInt64Array::from([Some(1)]).boxed(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test result was invalid. I debugged and clearly found null_count of 1.

@codecov
Copy link

codecov bot commented Feb 22, 2023

Codecov Report

Base: 83.61% // Head: 83.65% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (bbeb28a) compared to base (615300b).
Patch coverage: 57.37% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1414      +/-   ##
==========================================
+ Coverage   83.61%   83.65%   +0.03%     
==========================================
  Files         373      373              
  Lines       40326    40340      +14     
==========================================
+ Hits        33720    33747      +27     
+ Misses       6606     6593      -13     
Impacted Files Coverage Δ
src/io/parquet/write/dictionary.rs 82.96% <57.37%> (-3.94%) ⬇️
src/array/binary/mod.rs 92.61% <0.00%> (-1.48%) ⬇️
src/bitmap/utils/slice_iterator.rs 97.56% <0.00%> (-1.22%) ⬇️
src/array/utf8/mod.rs 83.98% <0.00%> (-1.18%) ⬇️
src/io/ipc/read/stream_async.rs 76.02% <0.00%> (-0.69%) ⬇️
src/io/ipc/read/file.rs 96.87% <0.00%> (+0.44%) ⬆️
src/io/ipc/read/file_async.rs 72.01% <0.00%> (+10.82%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jorgecarleitao jorgecarleitao changed the title fix(parquet): respect statistics and correct null_count Fixed statistics writing flag and correct null_count in dictionaries Feb 22, 2023
@jorgecarleitao jorgecarleitao added the bug Something isn't working label Feb 22, 2023
@jorgecarleitao jorgecarleitao merged commit c877287 into jorgecarleitao:main Feb 22, 2023
@ritchie46 ritchie46 deleted the dictionary_statistics branch February 22, 2023 15:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants