-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate ZST allocations in Box
and Vec
#113113
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit d75660ace87ed695200ac1b3bf2026144843eb77 with merge 596ac10b39cf84a4f4742f705dc63599654dcf78... |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Does this mean we'll need to check for zero for every deallocation of |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 853b6048108f88bc6e62e056b27a32c6484b0ffc with merge 58ad3a76f47a26e116e1d152a72ad86aeb650f41... |
☀️ Try build successful - checks-actions |
1 similar comment
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (58ad3a76f47a26e116e1d152a72ad86aeb650f41): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 662.942s -> 665.669s (0.41%) |
Yes. Unfortunately this is required for correctness. |
The perf results show a slight regression, but fundamentally I feel this is a cost we have to pay for correct handling of ZST allocations. |
This comment has been minimized.
This comment has been minimized.
r=me modulo CI @rustbot author |
@bors r=Mark-Simulacrum |
☀️ Test successful - checks-actions |
Finished benchmarking commit (cca3373): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 658.145s -> 659.456s (0.20%) |
@rustbot label: +perf-regression-triaged |
Indicate that multiplication in Layout::array cannot overflow Since rust-lang#113113, we have added a check that skips calling into the allocator at all if `capacity == 0`. The global, default allocator will not actually try to allocate though; it returns a dangling pointer explicitly. However, these two checks are not merged/deduplicated by LLVM and so we're comparing to zero twice whenever vectors are allocated/grown. Probably cheap, but also potentially expensive in code size and seems like an unfortunate miss. This removes that extra check by telling LLVM that the multiplication as part of Layout::array can't overflow, turning the original non-zero value into a zero value afterwards. In my checks locally this successfully drops the duplicate comparisons. See https://rust.godbolt.org/z/b6nPP9dcK for a code example. ```rust pub fn foo(elements: usize) -> Vec<u32> { Vec::with_capacity(elements) } ``` r? `@scottmcm` since you touched this in a32305a - curious if you have thoughts on doing this / can confirm my model of this being correct.
Indicate that multiplication in Layout::array cannot overflow Since rust-lang#113113, we have added a check that skips calling into the allocator at all if `capacity == 0`. The global, default allocator will not actually try to allocate though; it returns a dangling pointer explicitly. However, these two checks are not merged/deduplicated by LLVM and so we're comparing to zero twice whenever vectors are allocated/grown. Probably cheap, but also potentially expensive in code size and seems like an unfortunate miss. This removes that extra check by telling LLVM that the multiplication as part of Layout::array can't overflow, turning the original non-zero value into a zero value afterwards. In my checks locally this successfully drops the duplicate comparisons. See https://rust.godbolt.org/z/b6nPP9dcK for a code example. ```rust pub fn foo(elements: usize) -> Vec<u32> { Vec::with_capacity(elements) } ``` r? `@scottmcm` since you touched this in a32305a - curious if you have thoughts on doing this / can confirm my model of this being correct.
This PR fixes 2 issues with
Box
andRawVec
related to ZST allocations. Specifically, theAllocator
trait requires that:These restrictions exist because an allocator implementation is allowed to allocate non-zero amounts of memory for a zero-sized allocation. For example,
malloc
in libc does this.Currently, ZSTs are handled differently in
Box
andVec
:Vec
never allocates whenT
is a ZST or if the vector capacity is 0.Box
just blindly passes everything on to the allocator, including ZSTs.This causes problems due to the free conversions between
Box<[T]>
andVec<T>
, specifically that ZST allocations could get leaked or a dangling pointer could be passed todeallocate
.This PR fixes this by changing
Box
to not allocate for zero-sized values and slices. It also fixes a bug inRawVec::shrink
where shrinking to a size of zero did not actually free the backing memory.