Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Improved performance of utf8 check for ascii-only (-40% parquet reading ascii-only columns) #541

Closed
wants to merge 1 commit into from

Conversation

jorgecarleitao
Copy link
Owner

read utf8 2^20          time:   [8.1475 ms 8.2333 ms 8.3429 ms]                           
                        change: [-42.546% -41.204% -39.859%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)

Big kudos to @ritchie46 for point out this optimization on pola-rs/polars#1553

There are other optimizations on that PR, but this is already pretty big for a +5 ^_^

@jorgecarleitao jorgecarleitao changed the title Improved performance of utf8 check for ascii (2x parquet reading ascii-only columns) Improved performance of utf8 check for ascii (-40% parquet reading ascii-only columns) Oct 19, 2021
@jorgecarleitao jorgecarleitao changed the title Improved performance of utf8 check for ascii (-40% parquet reading ascii-only columns) Improved performance of utf8 check for ascii-only (-40% parquet reading ascii-only columns) Oct 19, 2021
@jorgecarleitao
Copy link
Owner Author

Closing in favor of #542 ; (man, that was pretty impressive synchronization :D)

@jorgecarleitao jorgecarleitao deleted the faster_utf8 branch October 19, 2021 18:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants