-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor utils::unpack_bits
for palette expanded images
#405
Conversation
We could do something like this to special case if bit_depth == 8 {
for (&curr, chunk) in row.iter().zip(&mut buf_chunks) {
func(curr, chunk);
}
} else {
let mask = ((1u16 << bit_depth) - 1) as u8;
for &curr in row.iter() {
let mut shift = 8 - bit_depth as i32;
while shift >= 0 {
if let Some(chunk) = buf_chunks.next() {
let pixel = (curr >> shift) & mask;
func(pixel, chunk);
} else {
return;
}
shift -= bit_depth as i32;
}
}
} I also tried this for the 8-bit depth cases for 3 and 4 channels and played with the constant size. It seemed slightly faster on the benchmark but not enough to show up green on criterion. The 3 byte palette lookup for each pixel seems to make it hard to find low hanging fruit for improvement. const CHUNK_SIZE: usize = 32;
let mut buf_chunks = buf.chunks_exact_mut(channels * CHUNK_SIZE);
let mut row_chunks = row.chunks_exact(CHUNK_SIZE);
for (buf_chunk, row_chunk) in (&mut buf_chunks).zip(&mut row_chunks) {
row_chunk
.iter()
.zip(buf_chunk.chunks_exact_mut(channels))
.for_each(|(&curr, chunk)| func(curr, chunk));
}
(buf_chunks.into_remainder().chunks_exact_mut(channels))
.zip(row_chunks.remainder().iter())
.for_each(|(chunk, &curr)| func(curr, chunk)); I put For testing, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments, but generally looks good to me!
src/utils.rs
Outdated
let pixel = (curr >> shift) & mask; | ||
func(pixel, chunk); | ||
} else { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly nervous about silently returning here. If we expect that row
and buf
will always have matching lengths (which I think we do?), then we should unwrap
/ expect
instead of returning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense.
In order to change this, I had to swap from iterating over the row in the outer loop to iterating over the chunks. When the shift cycle resets within the loop, we request the next row
value and if it doesn't exist then it's an error.
Following that thought process, should the bit_depth == 8
branch arm check for buf_chunks
being empty after the loop? Then assert or return an error if it isn't.
assert!(buf_chunks.next().is_none());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about asserting that input.len() * (8/bit_depth) * channels == output.len()
at the start of the method, and then the iteration order doesn't need to be swapped? Or is the problem that the check would fail because some of the call sites actually pass an output buffer that's larger than required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the checks fail. The input and output buffer sizes seem to vary enough that I can't easily derive an assertion equality that would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to figure out where these values came from and realized rowlen
and output_line_size
can be different values.
rowlen
comes from Reader::next_pass
and is equivalent to self.subframe.rowlen
.
output_line_size
is the result of Reader::output_line_size
which calls into ColorType::raw_row_length_from_width
with the width argument being self.subframe.width
for non-Adam7 images.
Lines 57 to 69 in 7642f0f
pub(crate) fn raw_row_length_from_width(self, depth: BitDepth, width: u32) -> usize { | |
let samples = width as usize * self.samples(); | |
1 + match depth { | |
BitDepth::Sixteen => samples * 2, | |
BitDepth::Eight => samples, | |
subbyte => { | |
let samples_per_byte = 8 / subbyte as usize; | |
let whole = samples / samples_per_byte; | |
let fract = usize::from(samples % samples_per_byte > 0); | |
whole + fract | |
} | |
} | |
} |
Writing out a longer answer of speculation, I think I came across a solution.
We want the input buffer to have at least as many shifts as there are output entries. Then, the input row will always have enough lookups for the output buffer chunks.
let input_max_entries = input.len() * (8 / bit_depth as usize);
let output_entries = output.len() / channels;
assert!(input_max_entries >= output_entries);
So the assert you wrote out was correct, we just need to change from ==
to >=
and it works.
assert!(input.len() * (8 / bit_depth as usize) * channels >= output.len());
Is multiplication overflow be a concern, should saturating_mul
be used?
I'd say special casing bit_depth = 8 does make sense, but is fine for a followup if you'd prefer |
Remove copy_from_slice in palette match arms of next_interlaced_row_impl Reuse the previous row in unpack_bits calculation instead of expanding the palette within the buffer Use chunks_exact and iterate from start to end of the buffer instead of back to front in-place
Add assert for `bit_depth` in `unpack_bits` Special case for `BitDepth::Eight`
Does it make sense to relocate With that change, |
Moving the method to However, I'd prefer not to make the method return errors. The crate should already be validating any user-provided input buffers and the image bit depth in other places, so if either of those things are wrong in This is admittedly a painful tradeoff: Having the code panic means we learn about bugs faster and often they're a bit easier to fix, but at the same time means they potentially have a higher user impact until they're fixed (which can sometimes take quite a while) |
Thanks for explaining the philosophy of errors vs panics for this situation. I saw a (possibly) redundant check nearby in Lines 819 to 828 in 7642f0f
|
Ensure that the input buffer can produce enough bit shifts per input entry to match or exceed the length of the output buffer.
Reuse the previous row in
unpack_bits
calculation instead of expanding the palette within the bufferUse
chunks_exact
and iterate from start to end of the buffer instead of back to front in-placeI found this code easier to reason about and might enable someone to optimize this further in the future.
I didn't see a performance difference between the two versions.