Generator optimization: Overlap locals that never have storage live at the same time #60187

tmandry · 2019-04-23T01:28:24Z

The specific goal of this optimization is to optimize async fns which use await!. Notably, await! has an enclosing scope around the futures it awaits (definition), which we rely on to implement the optimization.

More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. We care about storage-liveness when computing the layout, because knowing a field is StorageDead is the only way to prove it will not be accessed, either directly or through a reference.

To determine whether we can overlap two locals in the generator layout, we look at whether they might both be StorageLive at any point in the MIR. We use the MaybeStorageLive dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with.

Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout.

For the layout computation, we use a simplified approach.

Start with the set of locals assigned to only one variant. The rest are disqualified.
For each pair of locals which may conflict and are not assigned to the same variant, we pick one local to disqualify from overlapping.

Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone".

So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this:

You can think of a generator as an enum, where some fields are shared between variants. e.g.

enum Generator {
  Unresumed,
  Poisoned,
  Returned,
  Suspend0(A, B, X),
  Suspend1(B),
  Suspend2(A, Y, Z),
}

where every mention of A and B refer to the same field, which does not move when changing variants. Note that A and B would automatically be sent to the prefix in this example. Assuming that X is never StorageLive at the same time as either Y or Z, it would be allowed to overlap with them.

Note that if two locals (Y and Z in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time.

Depends on:

Multi-variant layouts for generators #59897 Multi-variant layouts for generators
Preserve local scopes in generator MIR #60840 Preserve local scopes in generator MIR
Emit StorageDead along unwind paths for generators #61373 Emit StorageDead along unwind paths for generators

Before merging:

~~Wrap the types of all generator fields in MaybeUninitialized in layout::ty::field~~ (opened Don't ignore generator fields in miri #60889)
Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location)
Clean up TODO
Fix the layout code to enforce that the same field never moves around in the generator
Add tests for async/await
~~Reduce # bits we store by half, since the conflict relation is symmetric~~ (note: decided not to do this, for simplicity)
Store liveness information for each yield point in our GeneratorLayout, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at Multi-variant layouts for generators #59897 (comment))

rust-highfive · 2019-04-23T01:28:28Z

r? @davidtwco

(rust_highfive has picked a reviewer for you, use r? to override)

Zoxc · 2019-04-23T01:42:47Z

@bors try

bors · 2019-04-23T01:43:19Z

⌛ Trying commit e27b6fd with merge faed5e2...

[WIP] Generator optimization: Overlap locals that never have storage live at the same time This depends on #59897 (which is currently failing). Only the commits from "Emit StorageDead for all locals" and on are relevant. The reason I'm opening this now is so we can get some perf numbers and discuss the approach.

rust-highfive · 2019-04-23T01:43:51Z

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

travis_time:end:155d63f6:start=1555982977591402820,finish=1555982978514722160,duration=923319340
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:03:37] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:37] tidy error: /checkout/src/librustc_mir/transform/generator.rs:561: TODO is deprecated; use FIXME
[00:03:39] some tidy checks failed
[00:03:39] 
[00:03:39] 
[00:03:39] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:39] 
[00:03:39] 
[00:03:39] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:39] Build completed unsuccessfully in 0:00:45
[00:03:39] Build completed unsuccessfully in 0:00:45
[00:03:39] make: *** [tidy] Error 1
[00:03:39] Makefile:67: recipe for target 'tidy' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0094b6c6
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Tue Apr 23 01:33:29 UTC 2019
---
travis_time:end:15be1d78:start=1555983210069161032,finish=1555983210073859753,duration=4698721
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:07c2e027
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:11effaa0
travis_time:start:11effaa0
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:078d6cef
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

bors · 2019-04-23T04:01:06Z

☀️ Try build successful - checks-travis
Build commit: faed5e2

Zoxc · 2019-04-23T04:19:52Z

@rust-timer build faed5e2

rust-timer · 2019-04-23T04:19:53Z

Success: Queued faed5e2 with parent 004c549, comparison URL.

rust-timer · 2019-04-23T15:52:34Z

Finished benchmarking try commit faed5e2

tmandry · 2019-04-23T20:06:47Z

Benchmarking results do not look good. Next I'm trying an approach that doesn't create more basic blocks in the MIR. Instead it propagates StorageDeads forward along the cleanup path, which means we may hit some extra StorageDeads along the way. This shouldn't cause any problems, but I will have to update the analysis pass of the optimization if we keep this.

If the benchmarks still do not look good, I'll try a smarter approach of assembling the StorageDead statements than putting them in a vec and reversing them. At this point I suspect the larger performance hit is due to new basic blocks, but I'm not sure.

rust-highfive · 2019-04-23T20:27:09Z

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

travis_time:end:20ec75c0:start=1556049886727429768,finish=1556049981616995594,duration=94889565826
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:03:52] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:52] tidy error: /checkout/src/librustc_mir/transform/generator.rs:561: TODO is deprecated; use FIXME
[00:03:54] some tidy checks failed
[00:03:54] 
[00:03:54] 
[00:03:54] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:54] 
[00:03:54] 
[00:03:54] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:54] Build completed unsuccessfully in 0:00:44
[00:03:54] Build completed unsuccessfully in 0:00:44
[00:03:54] Makefile:67: recipe for target 'tidy' failed
[00:03:54] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:02451030
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Tue Apr 23 20:10:25 UTC 2019
---
travis_time:end:00da78d6:start=1556050226138966140,finish=1556050226150654242,duration=11688102
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:25678716
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:00be7c32
travis_time:start:00be7c32
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:15f7112a
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

cramertj · 2019-04-23T20:59:30Z

@bors try

bors · 2019-04-23T20:59:42Z

⌛ Trying commit 8bbe3449442738b1a545953a5bcf2a3ec20cdbd4 with merge 32434c7ac565769649a967e525da3d6dabfaca0a...

bors · 2019-04-23T23:14:11Z

☀️ Try build successful - checks-travis
Build commit: 32434c7ac565769649a967e525da3d6dabfaca0a

cramertj · 2019-04-23T23:16:43Z

@rust-timer build 32434c7ac565769649a967e525da3d6dabfaca0a

rust-timer · 2019-04-23T23:16:53Z

Success: Queued 32434c7ac565769649a967e525da3d6dabfaca0a with parent 4eff852, comparison URL.

rust-timer · 2019-04-23T23:30:18Z

Finished benchmarking try commit 32434c7ac565769649a967e525da3d6dabfaca0a

Zoxc · 2019-04-24T01:46:50Z

@tmandry Only changing StorageDead generation for generators should get rid of the performance regressions.

tmandry · 2019-04-24T02:01:50Z

@tmandry Only changing StorageDead generation for generators should get rid of the performance regressions.

@Zoxc yep, that's what I'll fall back to, if needed!

rust-timer · 2019-04-24T05:49:46Z

Finished benchmarking try commit 32434c7ac565769649a967e525da3d6dabfaca0a

rust-highfive · 2019-04-25T01:56:43Z

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

travis_time:end:1f485cf0:start=1556156481484382630,finish=1556156482401242044,duration=916859414
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:04:11] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:04:11] tidy error: /checkout/src/librustc_mir/transform/generator.rs:561: TODO is deprecated; use FIXME
[00:04:13] some tidy checks failed
[00:04:13] 
[00:04:13] 
[00:04:13] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:04:13] 
[00:04:13] 
[00:04:13] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:04:13] Build completed unsuccessfully in 0:00:45
[00:04:13] Build completed unsuccessfully in 0:00:45
[00:04:13] make: *** [tidy] Error 1
[00:04:13] Makefile:67: recipe for target 'tidy' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:201bfb6e
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Thu Apr 25 01:45:47 UTC 2019
---
travis_time:end:02be76d9:start=1556156747911994846,finish=1556156747916916594,duration=4921748
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:0e908098
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:084779d0
travis_time:start:084779d0
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:0f7238cd
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

bors · 2019-04-25T06:44:35Z

☔ The latest upstream changes (presumably #60250) made this pull request unmergeable. Please resolve the merge conflicts.

bors · 2019-06-11T21:29:21Z

📌 Commit aeefbc4 has been approved by eddyb

bors · 2019-06-12T02:13:20Z

⌛ Testing commit aeefbc4 with merge 05083c2...

Generator optimization: Overlap locals that never have storage live at the same time The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization. More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.** To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with. Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout. For the layout computation, we use a simplified approach. 1. Start with the set of locals assigned to only one variant. The rest are disqualified. 2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping. Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone". So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this: You can think of a generator as an enum, where some fields are shared between variants. e.g. ```rust enum Generator { Unresumed, Poisoned, Returned, Suspend0(A, B, X), Suspend1(B), Suspend2(A, Y, Z), } ``` where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them. Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time. --- Depends on: - [x] #59897 Multi-variant layouts for generators - [x] #60840 Preserve local scopes in generator MIR - [x] #61373 Emit StorageDead along unwind paths for generators Before merging: - [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened #60889) - [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location) - [x] Clean up TODO - [x] Fix the layout code to enforce that the same field never moves around in the generator - [x] Add tests for async/await - [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity) - [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at #59897 (comment))

…ddyb Generator optimization: Overlap locals that never have storage live at the same time The specific goal of this optimization is to optimize async fns which use `await!`. Notably, `await!` has an enclosing scope around the futures it awaits ([definition](https://github.com/rust-lang/rust/blob/08bfe16129b0621bc90184f8704523d4929695ef/src/libstd/macros.rs#L365-L381)), which we rely on to implement the optimization. More generally, the optimization allows overlapping the storage of some locals which are never storage-live at the same time. **We care about storage-liveness when computing the layout, because knowing a field is `StorageDead` is the only way to prove it will not be accessed, either directly or through a reference.** To determine whether we can overlap two locals in the generator layout, we look at whether they might *both* be `StorageLive` at any point in the MIR. We use the `MaybeStorageLive` dataflow analysis for this. We iterate over every location in the MIR, and build a bitset for each local of the locals it might potentially conflict with. Next, we assign every saved local to one or more variants. The variants correspond to suspension points, and we include the set of locals live across a given suspension point in the variant. (Note that we use liveness instead of storage-liveness here; this ensures that the local has actually been initialized in each variant it has been included in. If the local is not live across a suspension point, then it doesn't need to be included in that variant.). It's important to note that the variants are a "view" into our layout. For the layout computation, we use a simplified approach. 1. Start with the set of locals assigned to only one variant. The rest are disqualified. 2. For each pair of locals which may conflict *and are not assigned to the same variant*, we pick one local to disqualify from overlapping. Disqualified locals go into a non-overlapping "prefix" at the beginning of our layout. This means they always have space reserved for them. All the locals that are allowed to overlap in each variant are then laid out after this prefix, in the "overlap zone". So, if A and B were disqualified, and X, Y, and Z were all eligible for overlap, our generator might look something like this: You can think of a generator as an enum, where some fields are shared between variants. e.g. ```rust enum Generator { Unresumed, Poisoned, Returned, Suspend0(A, B, X), Suspend1(B), Suspend2(A, Y, Z), } ``` where every mention of `A` and `B` refer to the same field, which does not move when changing variants. Note that `A` and `B` would automatically be sent to the prefix in this example. Assuming that `X` is never `StorageLive` at the same time as either `Y` or `Z`, it would be allowed to overlap with them. Note that if two locals (`Y` and `Z` in this case) are assigned to the same variant in our generator, their memory would never overlap in the layout. Thus they can both be eligible for the overlapping section, even if they are storage-live at the same time. --- Depends on: - [x] rust-lang#59897 Multi-variant layouts for generators - [x] rust-lang#60840 Preserve local scopes in generator MIR - [x] rust-lang#61373 Emit StorageDead along unwind paths for generators Before merging: - [x] ~Wrap the types of all generator fields in `MaybeUninitialized` in layout::ty::field~ (opened rust-lang#60889) - [x] Make PR description more complete (e.g. explain why storage liveness is important and why we have to check every location) - [x] Clean up TODO - [x] Fix the layout code to enforce that the same field never moves around in the generator - [x] Add tests for async/await - [x] ~Reduce # bits we store by half, since the conflict relation is symmetric~ (note: decided not to do this, for simplicity) - [x] Store liveness information for each yield point in our `GeneratorLayout`, that way we can emit more useful debuginfo AND tell miri which fields are definitely initialized for a given variant (see discussion at rust-lang#59897 (comment))

bors · 2019-06-12T04:57:29Z

☀️ Test successful - checks-travis, status-appveyor
Approved by: eddyb
Pushing 05083c2 to master...

@ghost

Rollup of 9 pull requests Successful merges: - #60187 (Generator optimization: Overlap locals that never have storage live at the same time) - #61348 (Implement Clone::clone_from for Option and Result) - #61568 (Use Symbol, Span in libfmt_macros) - #61632 (ci: Collect CPU usage statistics on Azure) - #61654 (use pattern matching for slices destructuring) - #61671 (implement nth_back for Range(Inclusive)) - #61688 (is_fp and is_floating_point do the same thing, remove the former) - #61705 (Pass cflags rather than cxxflags to LLVM as CMAKE_C_FLAGS) - #61734 (Migrate rust-by-example to MdBook2) Failed merges: r? @ghost

@tmandry

rustc: correctly transform memory_index mappings for generators. Fixes #61793, closes #62011 (previous attempt at fixing #61793). During #60187, I made the mistake of suggesting that the (re-)computation of `memory_index` in `ty::layout`, after generator-specific logic split/recombined fields, be done off of the `offsets` of those fields (which needed to be computed anyway), as opposed to the `memory_index`. `memory_index` maps each field to its in-memory order index, which ranges over the same `0..n` values as the fields themselves, making it a bijective mapping, and more specifically a permutation (indeed, it's the permutation resulting from field reordering optimizations). Each field has an unique "memory index", meaning a sort based on them, even an unstable one, will not put them in the wrong order. But offsets don't have that property, because of ZSTs (which do not increase the offset), so sorting based on the offset of fields alone can (and did) result in wrong orders. Instead of going back to sorting based on (slices/subsets of) `memory_index`, or special-casing ZSTs to make sorting based on offsets produce the right results (presumably), as #62011 does, I opted to drop sorting altogether and focus on `O(n)` operations involving *permutations*: * a permutation is easily inverted (see the `invert_mapping` `fn`) * an `inverse_memory_index` was already employed in other parts of the `ty::layout` code (that is, a mapping from memory order to field indices) * inverting twice produces the original permutation, so you can invert, modify, and invert again, if it's easier to modify the inverse mapping than the direct one * you can modify/remove elements in a permutation, as long as the result remains dense (i.e. using every integer in `0..len`, without gaps) * for splitting a `0..n` permutation into disjoint `0..x` and `x..n` ranges, you can pick the elements based on a `i < x` / `i >= x` predicate, and for the latter, also subtract `x` to compact the range to `0..n-x` * in the general case, for taking an arbitrary subset of the permutation, you need a renumbering from that subset to a dense `0..subset.len()` - but notably, this is still `O(n)`! * you can merge permutations, as long as the result remains disjoint (i.e. each element is unique) * for concatenating two `0..n` and `0..m` permutations, you can renumber the elements in the latter to `n..n+m` * some of these operations can be combined, and an inverse mapping (be it a permutation or not) can still be used instead of a forward one by changing the "domain" of the loop performing the operation I wish I had a nicer / more mathematical description of the recombinations involved, but my focus was to fix the bug (in a way which preserves information more directly than sorting would), so I may have missed potential changes in the surrounding generator layout code, that would make this all more straight-forward. r? @tmandry

I tested the generator optimizations in rust-lang#60187 and rust-lang#61922 on the Fuchsia build, and noticed that some small generators (about 8% of the async fns in our build) increased in size slightly. This is because in rust-lang#60187 we split the fields into two groups, a "prefix" non-overlap region and an overlap region, and lay them out separately. This can introduce unnecessary padding bytes between the two groups. In every single case in the Fuchsia build, it was due to there being only a single variant being used in the overlap region. This means that we aren't doing any overlapping, period. So it's better to combine the two regions into one and lay out all the fields at once, which is what this change does.

@cramertj

…ssions, r=cramertj Fix generator size regressions due to optimization I tested the generator optimizations in rust-lang#60187 and rust-lang#61922 on the Fuchsia build, and noticed that some small generators (about 8% of the async fns in our build) increased in size slightly. This is because in rust-lang#60187 we split the fields into two groups, a "prefix" non-overlap region and an overlap region, and lay them out separately. This can introduce unnecessary padding bytes between the two groups. In every single case in the Fuchsia build, it was due to there being only a single variant being used in the overlap region. This means that we aren't doing any overlapping, period. So it's better to combine the two regions into one and lay out all the fields at once, which is what this change does. r? @cramertj cc @eddyb @Zoxc

rust-highfive assigned davidtwco Apr 23, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 23, 2019

tmandry changed the title ~~Generator optimization: Overlap locals that never have storage live at the same time~~ [WIP] Generator optimization: Overlap locals that never have storage live at the same time Apr 23, 2019

rust-lang deleted a comment from rust-timer Apr 23, 2019

tmandry force-pushed the generator-optimization branch 2 times, most recently from 7294f02 to dcd5da8 Compare April 25, 2019 01:40

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 11, 2019

Centril mentioned this pull request Jun 12, 2019

Rollup of 9 pull requests #61758

Merged

bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 12, 2019

bors merged commit aeefbc4 into rust-lang:master Jun 12, 2019

bors mentioned this pull request Jun 12, 2019

rustc: replace TyCtxt<'a, 'gcx, 'tcx> with TyCtxt<'gcx, 'tcx>. #61722

Merged

Matthias247 mentioned this pull request Jun 13, 2019

Codegen ICE/regression with 2019-06-12 nightly when using async fn<T: Fn()>(&self, T) #61793

Closed

tmandry deleted the generator-optimization branch June 14, 2019 00:47

tmandry mentioned this pull request Jun 14, 2019

Generator size: unwinding and drops force extra generator state allocation #59123

Closed

RalfJung mentioned this pull request Jun 18, 2019

Don't store locals that have been moved from in generators #61922

Merged

eddyb mentioned this pull request Jun 23, 2019

rustc: correctly transform memory_index mappings for generators. #62072

Merged

RalfJung mentioned this pull request Jun 23, 2019

Collection of assumptions about MIR semantics rust-lang/unsafe-code-guidelines#150

Closed

This was referenced Jul 8, 2019

Drop with guaranteed move elision #62508

Open

Generator size: borrowed variables are assumed live across following yield points #59087

Open

4teap mentioned this pull request Jul 24, 2019

Async fn doubles argument size #62958

Open

This was referenced Jul 27, 2019

Fix generator size regressions due to optimization #63034

Merged

Use MaybeUninit to produce more correct layouts #63035

Merged

ecstatic-morse mentioned this pull request Mar 10, 2020

MaybeRequiresStorage for generators is not necessary #69902

Closed

tmandry mentioned this pull request Apr 25, 2022

rebinding values in an async function can double the memory usage #96084

Open

cramertj mentioned this pull request Sep 5, 2024

Exhaustive stack usage llvm/llvm-project#43598

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generator optimization: Overlap locals that never have storage live at the same time #60187

Generator optimization: Overlap locals that never have storage live at the same time #60187

tmandry commented Apr 23, 2019 •

edited

Loading

rust-highfive commented Apr 23, 2019

Zoxc commented Apr 23, 2019

bors commented Apr 23, 2019

rust-highfive commented Apr 23, 2019

bors commented Apr 23, 2019

Zoxc commented Apr 23, 2019

rust-timer commented Apr 23, 2019

rust-timer commented Apr 23, 2019

tmandry commented Apr 23, 2019

rust-highfive commented Apr 23, 2019

cramertj commented Apr 23, 2019

bors commented Apr 23, 2019

bors commented Apr 23, 2019

cramertj commented Apr 23, 2019 •

edited

Loading

rust-timer commented Apr 23, 2019

rust-timer commented Apr 23, 2019

Zoxc commented Apr 24, 2019

tmandry commented Apr 24, 2019

rust-timer commented Apr 24, 2019

rust-highfive commented Apr 25, 2019

bors commented Apr 25, 2019

bors commented Jun 11, 2019

bors commented Jun 12, 2019

bors commented Jun 12, 2019

Generator optimization: Overlap locals that never have storage live at the same time #60187

Generator optimization: Overlap locals that never have storage live at the same time #60187

Conversation

tmandry commented Apr 23, 2019 • edited Loading

rust-highfive commented Apr 23, 2019

Zoxc commented Apr 23, 2019

bors commented Apr 23, 2019

rust-highfive commented Apr 23, 2019

bors commented Apr 23, 2019

Zoxc commented Apr 23, 2019

rust-timer commented Apr 23, 2019

rust-timer commented Apr 23, 2019

tmandry commented Apr 23, 2019

rust-highfive commented Apr 23, 2019

cramertj commented Apr 23, 2019

bors commented Apr 23, 2019

bors commented Apr 23, 2019

cramertj commented Apr 23, 2019 • edited Loading

rust-timer commented Apr 23, 2019

rust-timer commented Apr 23, 2019

Zoxc commented Apr 24, 2019

tmandry commented Apr 24, 2019

rust-timer commented Apr 24, 2019

rust-highfive commented Apr 25, 2019

bors commented Apr 25, 2019

bors commented Jun 11, 2019

bors commented Jun 12, 2019

bors commented Jun 12, 2019

tmandry commented Apr 23, 2019 •

edited

Loading

cramertj commented Apr 23, 2019 •

edited

Loading