Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor utils::unpack_bits for palette expanded images #405

Merged
merged 3 commits into from
Aug 27, 2023

Conversation

okaneco
Copy link
Contributor

@okaneco okaneco commented Aug 8, 2023

Reuse the previous row in unpack_bits calculation instead of expanding the palette within the buffer
Use chunks_exact and iterate from start to end of the buffer instead of back to front in-place


I found this code easier to reason about and might enable someone to optimize this further in the future.
I didn't see a performance difference between the two versions.

src/decoder/mod.rs Outdated Show resolved Hide resolved
@okaneco
Copy link
Contributor Author

okaneco commented Aug 10, 2023

We could do something like this to special case BitDepth::Eight since it doesn't need to calculate the mask and shift.

if bit_depth == 8 {
    for (&curr, chunk) in row.iter().zip(&mut buf_chunks) {
        func(curr, chunk);
    }
} else {
    let mask = ((1u16 << bit_depth) - 1) as u8;

    for &curr in row.iter() {
        let mut shift = 8 - bit_depth as i32;

        while shift >= 0 {
            if let Some(chunk) = buf_chunks.next() {
                let pixel = (curr >> shift) & mask;
                func(pixel, chunk);
            } else {
                return;
            }

            shift -= bit_depth as i32;
        }
    }
}

I also tried this for the 8-bit depth cases for 3 and 4 channels and played with the constant size. It seemed slightly faster on the benchmark but not enough to show up green on criterion. The 3 byte palette lookup for each pixel seems to make it hard to find low hanging fruit for improvement.

const CHUNK_SIZE: usize = 32;

let mut buf_chunks = buf.chunks_exact_mut(channels * CHUNK_SIZE);
let mut row_chunks = row.chunks_exact(CHUNK_SIZE);

for (buf_chunk, row_chunk) in (&mut buf_chunks).zip(&mut row_chunks) {
    row_chunk
        .iter()
        .zip(buf_chunk.chunks_exact_mut(channels))
        .for_each(|(&curr, chunk)| func(curr, chunk));
}

(buf_chunks.into_remainder().chunks_exact_mut(channels))
    .zip(row_chunks.remainder().iter())
    .for_each(|(chunk, &curr)| func(curr, chunk));

I put speed_bench_palette.png into the tests/benches/ folder and used cargo bench -- speed_bench
https://github.com/etemesi254/zune-image/tree/dev/test-images/png/benchmarks

For testing, cargo test -- render will run the tests that are most likely to fail without running the whole suite.

Copy link
Contributor

@fintelia fintelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments, but generally looks good to me!

src/utils.rs Outdated Show resolved Hide resolved
src/utils.rs Show resolved Hide resolved
src/utils.rs Outdated
let pixel = (curr >> shift) & mask;
func(pixel, chunk);
} else {
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly nervous about silently returning here. If we expect that row and buf will always have matching lengths (which I think we do?), then we should unwrap / expect instead of returning

Copy link
Contributor Author

@okaneco okaneco Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

In order to change this, I had to swap from iterating over the row in the outer loop to iterating over the chunks. When the shift cycle resets within the loop, we request the next row value and if it doesn't exist then it's an error.

Following that thought process, should the bit_depth == 8 branch arm check for buf_chunks being empty after the loop? Then assert or return an error if it isn't.

assert!(buf_chunks.next().is_none());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about asserting that input.len() * (8/bit_depth) * channels == output.len() at the start of the method, and then the iteration order doesn't need to be swapped? Or is the problem that the check would fail because some of the call sites actually pass an output buffer that's larger than required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the checks fail. The input and output buffer sizes seem to vary enough that I can't easily derive an assertion equality that would work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's weird that the checks fail, give that we intentionally set sizes for the two buffers:

output_buffer.resize(output_line_size, 0u8);

let row = &self.prev[1..rowlen];

Copy link
Contributor Author

@okaneco okaneco Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to figure out where these values came from and realized rowlen and output_line_size can be different values.

rowlen comes from Reader::next_pass and is equivalent to self.subframe.rowlen.

output_line_size is the result of Reader::output_line_size which calls into ColorType::raw_row_length_from_width with the width argument being self.subframe.width for non-Adam7 images.

image-png/src/common.rs

Lines 57 to 69 in 7642f0f

pub(crate) fn raw_row_length_from_width(self, depth: BitDepth, width: u32) -> usize {
let samples = width as usize * self.samples();
1 + match depth {
BitDepth::Sixteen => samples * 2,
BitDepth::Eight => samples,
subbyte => {
let samples_per_byte = 8 / subbyte as usize;
let whole = samples / samples_per_byte;
let fract = usize::from(samples % samples_per_byte > 0);
whole + fract
}
}
}


Writing out a longer answer of speculation, I think I came across a solution.

We want the input buffer to have at least as many shifts as there are output entries. Then, the input row will always have enough lookups for the output buffer chunks.

let input_max_entries = input.len() * (8 / bit_depth as usize);
let output_entries = output.len() / channels;
assert!(input_max_entries >= output_entries);

So the assert you wrote out was correct, we just need to change from == to >= and it works.

assert!(input.len() * (8 / bit_depth as usize) * channels >= output.len());

Is multiplication overflow be a concern, should saturating_mul be used?

@fintelia
Copy link
Contributor

I'd say special casing bit_depth = 8 does make sense, but is fine for a followup if you'd prefer

Remove copy_from_slice in palette match arms of next_interlaced_row_impl
Reuse the previous row in unpack_bits calculation instead of
expanding the palette within the buffer
Use chunks_exact and iterate from start to end of the buffer instead
of back to front in-place
Add assert for `bit_depth` in `unpack_bits`
Special case for `BitDepth::Eight`
@okaneco
Copy link
Contributor Author

okaneco commented Aug 18, 2023

Does it make sense to relocate src/utils.rs to the decoder module in src/decoder? The functions inside of it are only used in decoder.

With that change, unpack_bits would be able to return errors instead of having to panic or assert. The assertion could instead return FormatErrorInner::InvalidBitDepth but I'm not sure there's a proper error for the input buffer being out of data.

@fintelia
Copy link
Contributor

Moving the method to src/decoder seems reasonable.

However, I'd prefer not to make the method return errors. The crate should already be validating any user-provided input buffers and the image bit depth in other places, so if either of those things are wrong in unpack_bits it means there's a bug in our code. By triggering a panic, we get a backtrace any time things go wrong and (if we're lucky) can detect the issue from fuzzing rather than waiting for user bug reports about incorrectly decoded images.

This is admittedly a painful tradeoff: Having the code panic means we learn about bugs faster and often they're a bit easier to fix, but at the same time means they potentially have a higher user impact until they're fixed (which can sometimes take quite a while)

@okaneco
Copy link
Contributor Author

okaneco commented Aug 19, 2023

Thanks for explaining the philosophy of errors vs panics for this situation. I saw a (possibly) redundant check nearby in expand_paletted and thought perhaps errors might've been wanted as well further down. But it also makes sense that you want to panic in such an internal function or discover it with fuzzing coverage because it indicates a bug.

if let BitDepth::Sixteen = info.bit_depth {
// This should have been caught earlier but let's check again. Can't hurt.
Err(DecodingError::Format(
FormatErrorInner::InvalidColorBitDepth {
color_type: ColorType::Indexed,
bit_depth: BitDepth::Sixteen,
}
.into(),
))
} else {

Ensure that the input buffer can produce enough bit shifts per
input entry to match or exceed the length of the output buffer.
@fintelia fintelia merged commit ae5dee9 into image-rs:master Aug 27, 2023
@okaneco okaneco deleted the unpack-bits branch August 27, 2023 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants