Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster u32 parsing #428

Merged
merged 5 commits into from
Dec 16, 2023
Merged

Faster u32 parsing #428

merged 5 commits into from
Dec 16, 2023

Conversation

anforowicz
Copy link
Contributor

This series of 5 commits refactors how the code parses u32 values. The cumulative impact on the runtime of the decode/generated-png:noncompressed-8x8.png benchmark looks as follows:

  • [-7.1475% -6.9281% -6.6885%] (p = 0.00 < 0.05)
  • [-7.7189% -7.5262% -7.3241%] (p = 0.00 < 0.05)
  • [-7.4631% -7.1972% -6.9319%] (p = 0.00 < 0.05)

I expect that this PR will have minimal impact on bigger images, so I didn't even try running the other benchmarks (I assume that any changes observed in the other benchmarks would be just statistical noise).

I think that this PR turned out quite nicely from code simplification and readability perspective. Even if there was no performance impact, I think it might be desirable to land it from this perspective.

Various notes:

  • I wondered whether it might be desirable to add some additional test coverage for partially read u32 values.
    • FWIW this path seems to be covered by decoder::tests::no_data_dup_on_finish - only one test... :-(
    • Once BufRead changes land, maybe we should tweak the render_images test to work with BufReader::with_capacity(1, <original input>) which should test dripping the input byte-by-byte.
    • I did try to run all the tests after temporarily tweaking BufReader::with_capacity calls in src/decoder/mod.rs so that a buffer of only 1 byte was used.
  • I wondered if U32ValueKind::Type after an IDAT chunk really needs to go through 2 state transitions (with a detour via Decoded::ImageDataFlushed during which self.state is set to the old/unchanged value). This is probably in the unnecessary-microoptimizations territory, but I was hoping to simplify the code a bit more. Still... I don't see how to do this cleanly.

This way removes a separate `State::Signature` variant and handles the
PNG signature via two `State::U32` states.  This is desirable because:

* This helps to reuse the performance gains for `State::U32`
  (i.e. avoid accumulating the signature byte-by-byte if the input
  buffer already has all its bytes)
* Avoiding separate code also has additional benefits:
    - Less pressure on the instructions cache
    - Ability to reuse branch prediction for `U32`-related code

The modified code already has reasonable test coverage via
http://www.schaik.com/pngsuite/#corrupted
@anforowicz
Copy link
Contributor Author

@fintelia, can you PTAL?

@fintelia
Copy link
Contributor

👍 This change looks good to me, but I haven't yet had a chance to do a detailed read through. Hopefully should get to that soon

@fintelia fintelia merged commit dbe2cc6 into image-rs:master Dec 16, 2023
19 checks passed
@anforowicz anforowicz deleted the faster-u32-parsing branch December 17, 2023 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants