Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vec![None; 1024] randomly gets vectorized or not #106911

Closed
frankdavid opened this issue Jan 15, 2023 · 6 comments · Fixed by #106989
Closed

vec![None; 1024] randomly gets vectorized or not #106911

frankdavid opened this issue Jan 15, 2023 · 6 comments · Fixed by #106989
Labels
E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@frankdavid
Copy link

frankdavid commented Jan 15, 2023

I want to initialize a large Vec<Option<i32>> with None.

pub fn vec_none() -> Vec<Option<i32>> {
    vec![None; 1024]
}

Compiling with Rust 1.66 -C opt-level=3 -C target-cpu=skylake generates mov instructions.

However, after initializing with Some(...) the code gets vectorized as expected:

pub fn vec_some() -> Vec<Option<i32>> {
    vec![Some(42); 1024]
}

The generated assembly contains vmovups instructions.

Now, the weird thing is that if I have both functions in the same file, both get vectorized.

See this demonstration. As long asvec_some() is commented out, vec_none() is not vectorized. As soon as you remove the comment, both functions are vectorized.

@scottmcm scottmcm added E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 15, 2023
@scottmcm
Copy link
Member

Easiest fix for Option<i32> specifically would be to update https://stdrs.dev/nightly/x86_64-unknown-linux-gnu/alloc/vec/is_zero/trait.IsZero.html, I think.

We don't guarantee a layout for it, but internally for optimizations I think we'd be safe doing this.

(We can put some tests like const _OPTION_I32_IS_VALID_FROM_ZEROS: Option<i32> = unsafe { crate::mem::zeroed() }; in non-public stuff to help double-check it.)

@nikic nikic added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jan 15, 2023
@frankdavid
Copy link
Author

When Option<T> is None, couldn't the T part of the memory layout be anything since it shouldn't be accessed? Couldn't we use that for optimization? So for example, if None sets the discriminator bit to 0, couldn't we always zero out the entire memory no matter what T is? Or if the discriminator is represented with a non-zero value, couldn't the whole data structure be filled up with that value (or just left unitialized depending on which is faster)?

@joboet
Copy link
Member

joboet commented Jan 16, 2023

When Option<T> is None, couldn't the T part of the memory layout be anything since it shouldn't be accessed? Couldn't we use that for optimization? So for example, if None sets the discriminator bit to 0, couldn't we always zero out the entire memory no matter what T is?

Unfortunately not, because there can be other niches than 0. For instance, None::<File> has the value 0xffff_ffff on Unix platforms.

@frankdavid
Copy link
Author

frankdavid commented Jan 16, 2023

Can we at least assume that two None::<T>-s will always have the same bytes in memory for any fixed T? If so, couldn't we vectorize filling up the Vec with those bytes?

@clubby789
Copy link
Contributor

I think similar to #101685. iirc the main issue is that LLVM's InstCombine removes the store undef instructions before reaching the passes that could use this information to optimize - at this point, it generally becomes a series of stores with gaps between which LLVM no longer knows can be written to freely.

@scottmcm
Copy link
Member

Can we at least assume that two None::<T>-s will always have the same bytes in memory for any fixed T?

No, because something like None::<u32> is half undef, where "the same" isn't a meaningful question.

couldn't we always zero out the entire memory no matter what T is?

That's exactly what I mention IsZero for -- you can use that to check that it's a value where it would be legal to zero out the whole thing, and if that function returns true then vec! uses a zeroing allocator instead of the Vec needing initialize the memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants