You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed this too recently and belive this is an issue in llvm. Rust simply emits a saturating cast intrinsic, on x86_64 llvm does not seem to know how to handle this using vector instructions. I opened llvm/llvm-project#59682 about that intrinsic preventing autovectorization, but it is probably the same issue with explicit vectors.
It seems that float-to-int casts fall back to individual
cvttss2si
and the like on x86_64:Output with scalar casts and bounds checks
I would have expected the bounds checking and cast done with vector ops. Here's a rough sketch (might not be right around the edges!)
Output with vector cast and bounds checks
I call out x86 because aarch64 doesn't seem to have this problem. I haven't checked other architectures
Output on aarch64 is vectorized
Meta
The text was updated successfully, but these errors were encountered: