-
Notifications
You must be signed in to change notification settings - Fork 851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support using simdutf8 for validate_string_view
and other utf8 validation
#7014
Comments
I mentioned this in #6668, but I was thinking we could just globally replace calls to #[cfg(feature = "simdutf8")]
#[inline(always)]
pub fn from_utf8(val: &[u8]) -> Result<&str, simdutf8::compat::Utf8Error> {
match simdutf8::basic::from_utf8(val) {
Ok(result) => Ok(result),
Err(_) => simdutf8::compat::from_utf8(val),
}
}
#[cfg(not(feature = "simdutf8"))]
#[inline(always)]
pub fn from_utf8(val: &[u8]) -> Result<&str, std::str::Utf8Error> {
std::str::from_utf8(val)
} |
I am embarassed to admit I missed the StringView one -- I will make a PR to switch |
Update I don't think I was going crazy. I can't find any remaining hot location that does utf8 validation that I think will make any different for performance To be clear, I don't think @etseidl 's idea is a bad one, just I don't think it will help much |
That is entirely possible. I'm still testing to see if I have any use cases that would benefit. |
Maybe I should have used the positive rather than double negative: " @etseidl 's idea is a good one, I just don't think it will help much " 🤦 |
There are places that still do arrow-rs/arrow-data/src/byte_view.rs Line 145 in 3bf29a2
This still has I think there is also some other places which can benefit by a bit from this (e.g. reading csv/json, casting...). |
So I think we can follow the idea of @etseidl to replace all |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We have some different sources were
std::str::from_utf8
is used for utf8 validation, such asvalidate_string_view
.We can speed it up by supporting
simdutf8
as well here.Describe the solution you'd like
Similar for #6668 , optionally switch to using
simdutf8
instead.Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: