Refactor `utils::unpack_bits` for palette expanded images #405

okaneco · 2023-08-08T19:26:58Z

Reuse the previous row in unpack_bits calculation instead of expanding the palette within the buffer
Use chunks_exact and iterate from start to end of the buffer instead of back to front in-place

I found this code easier to reason about and might enable someone to optimize this further in the future.
I didn't see a performance difference between the two versions.

src/decoder/mod.rs

okaneco · 2023-08-10T23:31:55Z

We could do something like this to special case BitDepth::Eight since it doesn't need to calculate the mask and shift.

if bit_depth == 8 {
    for (&curr, chunk) in row.iter().zip(&mut buf_chunks) {
        func(curr, chunk);
    }
} else {
    let mask = ((1u16 << bit_depth) - 1) as u8;

    for &curr in row.iter() {
        let mut shift = 8 - bit_depth as i32;

        while shift >= 0 {
            if let Some(chunk) = buf_chunks.next() {
                let pixel = (curr >> shift) & mask;
                func(pixel, chunk);
            } else {
                return;
            }

            shift -= bit_depth as i32;
        }
    }
}

I also tried this for the 8-bit depth cases for 3 and 4 channels and played with the constant size. It seemed slightly faster on the benchmark but not enough to show up green on criterion. The 3 byte palette lookup for each pixel seems to make it hard to find low hanging fruit for improvement.

const CHUNK_SIZE: usize = 32;

let mut buf_chunks = buf.chunks_exact_mut(channels * CHUNK_SIZE);
let mut row_chunks = row.chunks_exact(CHUNK_SIZE);

for (buf_chunk, row_chunk) in (&mut buf_chunks).zip(&mut row_chunks) {
    row_chunk
        .iter()
        .zip(buf_chunk.chunks_exact_mut(channels))
        .for_each(|(&curr, chunk)| func(curr, chunk));
}

(buf_chunks.into_remainder().chunks_exact_mut(channels))
    .zip(row_chunks.remainder().iter())
    .for_each(|(chunk, &curr)| func(curr, chunk));

I put speed_bench_palette.png into the tests/benches/ folder and used cargo bench -- speed_bench
https://github.com/etemesi254/zune-image/tree/dev/test-images/png/benchmarks

For testing, cargo test -- render will run the tests that are most likely to fail without running the whole suite.

fintelia

Added a few comments, but generally looks good to me!

src/utils.rs

fintelia · 2023-08-18T06:28:58Z

src/utils.rs

+                let pixel = (curr >> shift) & mask;
+                func(pixel, chunk);
+            } else {
+                return;


Slightly nervous about silently returning here. If we expect that row and buf will always have matching lengths (which I think we do?), then we should unwrap / expect instead of returning

That makes sense.

In order to change this, I had to swap from iterating over the row in the outer loop to iterating over the chunks. When the shift cycle resets within the loop, we request the next row value and if it doesn't exist then it's an error.

Following that thought process, should the bit_depth == 8 branch arm check for buf_chunks being empty after the loop? Then assert or return an error if it isn't.

assert!(buf_chunks.next().is_none());

How about asserting that input.len() * (8/bit_depth) * channels == output.len() at the start of the method, and then the iteration order doesn't need to be swapped? Or is the problem that the check would fail because some of the call sites actually pass an output buffer that's larger than required?

Yes, the checks fail. The input and output buffer sizes seem to vary enough that I can't easily derive an assertion equality that would work.

That's weird that the checks fail, give that we intentionally set sizes for the two buffers:

image-png/src/decoder/mod.rs

Line 582 in 7642f0f

output_buffer.resize(output_line_size, 0u8);

image-png/src/decoder/mod.rs

Line 600 in 7642f0f

let row = &self.prev[1..rowlen];

I tried to figure out where these values came from and realized rowlen and output_line_size can be different values.

rowlen comes from Reader::next_pass and is equivalent to self.subframe.rowlen.

output_line_size is the result of Reader::output_line_size which calls into ColorType::raw_row_length_from_width with the width argument being self.subframe.width for non-Adam7 images.

image-png/src/common.rs

Lines 57 to 69 in 7642f0f

pub(crate) fn raw_row_length_from_width(self, depth: BitDepth, width: u32) -> usize {

let samples = width as usize * self.samples();

1 + match depth {

BitDepth::Sixteen => samples * 2,

BitDepth::Eight => samples,

subbyte => {

let samples_per_byte = 8 / subbyte as usize;

let whole = samples / samples_per_byte;

let fract = usize::from(samples % samples_per_byte > 0);

whole + fract

}

}

}

Writing out a longer answer of speculation, I think I came across a solution.

We want the input buffer to have at least as many shifts as there are output entries. Then, the input row will always have enough lookups for the output buffer chunks.

let input_max_entries = input.len() * (8 / bit_depth as usize); let output_entries = output.len() / channels; assert!(input_max_entries >= output_entries);

So the assert you wrote out was correct, we just need to change from == to >= and it works.

assert!(input.len() * (8 / bit_depth as usize) * channels >= output.len());

Is multiplication overflow be a concern, should saturating_mul be used?

fintelia · 2023-08-18T06:29:44Z

I'd say special casing bit_depth = 8 does make sense, but is fine for a followup if you'd prefer

Remove copy_from_slice in palette match arms of next_interlaced_row_impl Reuse the previous row in unpack_bits calculation instead of expanding the palette within the buffer Use chunks_exact and iterate from start to end of the buffer instead of back to front in-place

Add assert for `bit_depth` in `unpack_bits` Special case for `BitDepth::Eight`

okaneco · 2023-08-18T10:24:59Z

Does it make sense to relocate src/utils.rs to the decoder module in src/decoder? The functions inside of it are only used in decoder.

With that change, unpack_bits would be able to return errors instead of having to panic or assert. The assertion could instead return FormatErrorInner::InvalidBitDepth but I'm not sure there's a proper error for the input buffer being out of data.

fintelia · 2023-08-18T20:54:36Z

Moving the method to src/decoder seems reasonable.

However, I'd prefer not to make the method return errors. The crate should already be validating any user-provided input buffers and the image bit depth in other places, so if either of those things are wrong in unpack_bits it means there's a bug in our code. By triggering a panic, we get a backtrace any time things go wrong and (if we're lucky) can detect the issue from fuzzing rather than waiting for user bug reports about incorrectly decoded images.

This is admittedly a painful tradeoff: Having the code panic means we learn about bugs faster and often they're a bit easier to fix, but at the same time means they potentially have a higher user impact until they're fixed (which can sometimes take quite a while)

okaneco · 2023-08-19T00:04:40Z

Thanks for explaining the philosophy of errors vs panics for this situation. I saw a (possibly) redundant check nearby in expand_paletted and thought perhaps errors might've been wanted as well further down. But it also makes sense that you want to panic in such an internal function or discover it with fuzzing coverage because it indicates a bug.

image-png/src/decoder/mod.rs

Lines 819 to 828 in 7642f0f

    
           if let BitDepth::Sixteen = info.bit_depth { 
        
               // This should have been caught earlier but let's check again. Can't hurt. 
        
               Err(DecodingError::Format( 
        
                   FormatErrorInner::InvalidColorBitDepth { 
        
                       color_type: ColorType::Indexed, 
        
                       bit_depth: BitDepth::Sixteen, 
        
                   } 
        
                   .into(), 
        
               )) 
        
           } else {

Ensure that the input buffer can produce enough bit shifts per input entry to match or exceed the length of the output buffer.

okaneco commented Aug 8, 2023

View reviewed changes

src/decoder/mod.rs Outdated Show resolved Hide resolved

okaneco force-pushed the unpack-bits branch from b39dad3 to 7381cc6 Compare August 8, 2023 21:17

fintelia reviewed Aug 18, 2023

View reviewed changes

okaneco added 2 commits August 18, 2023 04:22

Address review comments

6ac0de4

Add assert for `bit_depth` in `unpack_bits` Special case for `BitDepth::Eight`

okaneco force-pushed the unpack-bits branch from 7381cc6 to 6ac0de4 Compare August 18, 2023 10:23

Add assert in unpack_bits for input buffer

4068689

Ensure that the input buffer can produce enough bit shifts per input entry to match or exceed the length of the output buffer.

fintelia merged commit ae5dee9 into image-rs:master Aug 27, 2023

okaneco deleted the unpack-bits branch August 27, 2023 19:09

okaneco mentioned this pull request Jan 7, 2024

Very slow decoding of paletted PNG images #393

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `utils::unpack_bits` for palette expanded images #405

Refactor `utils::unpack_bits` for palette expanded images #405

okaneco commented Aug 8, 2023

okaneco commented Aug 10, 2023

fintelia left a comment

fintelia Aug 18, 2023

okaneco Aug 18, 2023 •

edited

Loading

fintelia Aug 18, 2023

okaneco Aug 18, 2023

fintelia Aug 19, 2023

okaneco Aug 19, 2023 •

edited

Loading

fintelia commented Aug 18, 2023

okaneco commented Aug 18, 2023 •

edited

Loading

fintelia commented Aug 18, 2023

okaneco commented Aug 19, 2023

	pub(crate) fn raw_row_length_from_width(self, depth: BitDepth, width: u32) -> usize {
	let samples = width as usize * self.samples();
	1 + match depth {
	BitDepth::Sixteen => samples * 2,
	BitDepth::Eight => samples,
	subbyte => {
	let samples_per_byte = 8 / subbyte as usize;
	let whole = samples / samples_per_byte;
	let fract = usize::from(samples % samples_per_byte > 0);
	whole + fract
	}
	}
	}

Refactor utils::unpack_bits for palette expanded images #405

Refactor utils::unpack_bits for palette expanded images #405

Conversation

okaneco commented Aug 8, 2023

okaneco commented Aug 10, 2023

fintelia left a comment

Choose a reason for hiding this comment

fintelia Aug 18, 2023

Choose a reason for hiding this comment

okaneco Aug 18, 2023 • edited Loading

Choose a reason for hiding this comment

fintelia Aug 18, 2023

Choose a reason for hiding this comment

okaneco Aug 18, 2023

Choose a reason for hiding this comment

fintelia Aug 19, 2023

Choose a reason for hiding this comment

okaneco Aug 19, 2023 • edited Loading

Choose a reason for hiding this comment

fintelia commented Aug 18, 2023

okaneco commented Aug 18, 2023 • edited Loading

fintelia commented Aug 18, 2023

okaneco commented Aug 19, 2023

Refactor `utils::unpack_bits` for palette expanded images #405

Refactor `utils::unpack_bits` for palette expanded images #405

okaneco Aug 18, 2023 •

edited

Loading

okaneco Aug 19, 2023 •

edited

Loading

okaneco commented Aug 18, 2023 •

edited

Loading