-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc_metadata: more safely read/write the index positions. #59887
Conversation
r? @varkor (rust_highfive has picked a reviewer for you, use r? to override) |
cc @nnethercote @bluss @michaelwoerister @bors try |
⌛ Trying commit 6156326580aa679e050e20bb974b02660bb12286 with merge f31b67a27f06fb7d6f40349a8e8df8ec54ce9982... |
☀️ Try build successful - checks-travis |
@rust-timer build f31b67a27f06fb7d6f40349a8e8df8ec54ce9982 |
Success: Queued f31b67a27f06fb7d6f40349a8e8df8ec54ce9982 with parent 8509127, comparison URL. |
Finished benchmarking try commit f31b67a27f06fb7d6f40349a8e8df8ec54ce9982 |
A small but clear improvement: the best instruction count reduction is 3%, and lots are in the 0-1% range. |
I'm sorry, what? I was expecting a regression, at most! |
cc @alexcrichton @rust-lang/wg-compiler-performance |
This seems to be a slight performance regression. |
How so? Pre-landing measurements indicated it was a slight improvement. |
Instruction count doesn't matter =P |
rustc_metadata: replace Entry table with one table for each of its fields (AoS -> SoA). *Based on top of #59887* In #59789 (comment) I noticed that for many cross-crate queries (e.g. `predicates_of(def_id)`), we were deserializing the `rustc_metadata::schema::Entry` for `def_id` *only* to read one field (i.e. `predicates`). But there are several such queries, and `Entry` is not particularly small (in terms of number of fields, the encoding itself is quite compact), so there is a large (and unnecessary) constant factor. This PR replaces the (random-access) array¹ of `Entry` structures ("AoS"), with many separate arrays¹, one for each field that used to be in `Entry` ("SoA"), resulting in the ability to read individual fields separately, with negligible time overhead (in thoery), and some size overhead (as these arrays are not sparse). In a way, the new approach is closer to incremental on-disk caches, which store each query's cached results separately, but it would take significantly more work to unify the two. For stage1 `libcore`'s metadata blob, the size overhead is `8.44%`, and I have another commit (not initially included in this PR because I want to do perf runs with both) that brings it down to `5.88%`. ¹(in the source, these arrays are called "tables", but perhaps they could use a better name)
Ah yeah, Now I suppose I need to do some experiments to see if I can get LLVM to generate the same code for both (maybe try |
To summarize my opinion on the metrics:
|
@nnethercote Instruction counts are cycles divided by cycles per instruction, which may be high variance but also is significant to performance, and can vary with changes we make.
Only if you actually put in the effort to make it reproducible (realtime kernel, disable ASLR, pin the thread of interest to one core with nothing else on it, etc.). EDIT: oops, I repeated most of what was said on the irlo thread, I read that afterwards. |
(moved diff comment here so it doesn't get lost) So, looking at the LLVM IR, this slicing is the culprit, I think. At least it's not reading the bytes one by one from memory, that's what I was really worried about. EDIT: {
const N: usize = $len;
assert_eq!(N, Self::BYTE_LEN);
let b: &[u8] = $bytes;
unsafe { std::slice::from_raw_parts(b.as_ptr() as *const [u8; N], b.len() / N) }
} |
I've updated the implementation to bring back some @bors try |
rustc_metadata: more safely read/write the index positions. This is a small part of a larger refactor, that I want to benchmark independently. The final code would be even cleaner than this, so this is sort of an "worst case" test.
☀️ Try build successful - checks-travis |
@rust-timer build 5c37296 |
Success: Queued 5c37296 with parent a7cef0b, comparison URL. |
Finished benchmarking try commit 5c37296 |
I think this is now within noise? I mean, the code being generated should be very close to before (other than unaligned for writing, instead of aligned). |
Looks to me like the perf effects are negligible. |
rustc_metadata: replace Entry table with one table for each of its fields (AoS -> SoA). *Based on top of #59887* In #59789 (comment) I noticed that for many cross-crate queries (e.g. `predicates_of(def_id)`), we were deserializing the `rustc_metadata::schema::Entry` for `def_id` *only* to read one field (i.e. `predicates`). But there are several such queries, and `Entry` is not particularly small (in terms of number of fields, the encoding itself is quite compact), so there is a large (and unnecessary) constant factor. This PR replaces the (random-access) array¹ of `Entry` structures ("AoS"), with many separate arrays¹, one for each field that used to be in `Entry` ("SoA"), resulting in the ability to read individual fields separately, with negligible time overhead (in thoery), and some size overhead (as these arrays are not sparse). In a way, the new approach is closer to incremental on-disk caches, which store each query's cached results separately, but it would take significantly more work to unify the two. For stage1 `libcore`'s metadata blob, the size overhead is `8.44%`, and I have another commit (~~not initially included because I want to do perf runs with both~~ **EDIT**: added it now) that brings it down to `5.88%`. ¹(in the source, these arrays are called "tables", but perhaps they could use a better name)
@Zoxc, this looks like the kind of thing you like. Would you mind doing the review? |
const BYTE_LEN: usize = $byte_len; | ||
fn read_from_bytes_at(b: &[u8], i: usize) -> Self { | ||
const BYTE_LEN: usize = $byte_len; | ||
// HACK(eddyb) ideally this would be done with fully safe code, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably could split the unsafe code into a function which casts [u8]
to [[u8; N]]
, just to make it a bit clearer what is going on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I... would do that, if I could write such a function generically.
I think we should land this separately for better bisectibility. |
@bors r+ |
📌 Commit f51e6c7 has been approved by |
rustc_metadata: more safely read/write the index positions. This is a small part of a larger refactor, that I want to benchmark independently. The final code would be even cleaner than this, so this is sort of an "worst case" test.
☀️ Test successful - checks-travis, status-appveyor |
This is a small part of a larger refactor, that I want to benchmark independently.
The final code would be even cleaner than this, so this is sort of an "worst case" test.