Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify BitReader (~5-10% faster) #2381

Merged
merged 5 commits into from
Aug 15, 2022
Merged

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Aug 8, 2022

Which issue does this PR close?

Closes #.

Rationale for this change

This makes a couple of changes:

arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                                                             
                        time:   [26.773 us 26.778 us 26.784 us]
                        change: [-12.530% -12.270% -11.986%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low severe
  3 (3.00%) high mild
  2 (2.00%) high severe
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                                                             
                        time:   [43.490 us 43.501 us 43.513 us]
                        change: [-8.0569% -7.7694% -7.4579%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                                                             
                        time:   [50.091 us 50.115 us 50.142 us]
                        change: [-3.0207% -2.7299% -2.4358%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                                                             
                        time:   [23.427 us 23.432 us 23.436 us]
                        change: [-7.3450% -7.2770% -7.2229%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high severe
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                                                             
                        time:   [40.159 us 40.169 us 40.179 us]
                        change: [-4.6524% -4.3771% -4.1016%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                                                             
                        time:   [49.294 us 49.312 us 49.330 us]
                        change: [-1.3314% -1.1008% -0.9611%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                                                            
                        time:   [171.83 us 171.96 us 172.09 us]
                        change: [-3.5398% -3.2856% -3.0931%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                                                            
                        time:   [189.01 us 189.08 us 189.16 us]
                        change: [-3.5964% -3.4081% -3.2798%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                                                            
                        time:   [211.78 us 211.91 us 212.03 us]
                        change: [-4.9192% -4.6823% -4.5268%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                                                            
                        time:   [134.94 us 135.22 us 135.51 us]
                        change: [-1.3736% -1.0797% -0.8383%] (p = 0.00 < 0.05)
                        Change within noise threshold.
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                                                            
                        time:   [150.57 us 150.81 us 151.06 us]
                        change: [-1.1385% -0.8209% -0.5214%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                                                            
                        time:   [188.25 us 188.37 us 188.49 us]
                        change: [-5.6464% -5.3275% -5.0459%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                                                             
                        time:   [24.404 us 24.413 us 24.423 us]
                        change: [-1.8395% -1.5543% -1.2527%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                                                             
                        time:   [41.200 us 41.229 us 41.271 us]
                        change: [-1.9306% -1.5861% -1.2388%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                                                                             
                        time:   [48.046 us 48.075 us 48.104 us]
                        change: [-7.1111% -6.7512% -6.3610%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

What changes are included in this PR?

  1. Removes total_bytes making it easier for LLVM to elide bounds checks
  2. Lazily populates buffered_values when actually needed
  3. Simplifies the skip implementation

The second of these is particularly impactful for DeltaBitPackedDecoder, where the miniblocks are in chunks of 32 and therefore never actually fall out of byte alignment.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Aug 8, 2022
}

true
self.skip(1, num_bits) == 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually called by anything I can find, so lets just keep it simple

}

// TODO: better to avoid copying here
Some(from_ne_slice(v.as_bytes()))
}

/// Skip one value of size `num_bits`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed as no longer needed, was only ever called by the implementation of BitReader::skip which now does something simpler

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold

Seem like a nice improvement to me, but I did not follow all the bit_offset updating logic (thought it seems like most of my confusion is related to logic that predates this PR)

cc @sunchao in case you are interested

let bytes_to_read = cmp::min(self.total_bytes - self.byte_offset, 8);
/// Populates `self.buffered_values`
#[inline]
fn load_buffer(&mut self) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given there is a field called buffer and this function is not loading it, but rather reading from it and loading into buffered_values I recommend calling this function something slightly verbose:

Suggested change
fn load_buffer(&mut self) {
/// Loads up to the the next 8 bytes from `self.buffer` at `self.byte_offfset`
/// into `self.buffered_values`. Reads fewer than 8 bytes if there are fewer than 8 bytes left
fn load_buffered_values(&mut self) {

@@ -429,7 +406,7 @@ impl BitReader {

let mut values_to_read = batch.len();
let needed_bits = num_bits * values_to_read;
let remaining_bits = (self.total_bytes - self.byte_offset) * 8 - self.bit_offset;
let remaining_bits = (self.buffer.len() - self.byte_offset) * 8 - self.bit_offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this PR is any better or worse, but what ensures that self.bit_offset (which can be up to 63) is always less than self.buffer.len() - self.byte_offset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic on the next line, and similar variants of it. We only read a number of bits based on what remains

return None;
}

// Only need to populate buffer if byte aligned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment -- maybe some more context would help about why the buffer needs to be loaded if there are no more bits left in the currrent byte

self.reload_buffer_values();
v |= trailing_bits(self.buffered_values, self.bit_offset)
.wrapping_shl((num_bits - self.bit_offset) as u32);
if self.bit_offset != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic that decides when to reload self.buffered_values is confusing to me as I would normally expect calling self.load_buffer() would also reset bit_offset to 0 to match having just reloaded more bits into self.buffered_values

@tustvold tustvold merged commit 47a2c21 into apache:master Aug 15, 2022
@ursabot
Copy link

ursabot commented Aug 15, 2022

Benchmark runs are scheduled for baseline = 76e79d9 and contender = 47a2c21. 47a2c21 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants