Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the default branch with an unreachable branch If it is the last variant #120268

Merged
merged 5 commits into from
Mar 8, 2024

Conversation

DianQK
Copy link
Member

@DianQK DianQK commented Jan 23, 2024

Fixes #119520. Fixes #110097.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 23, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jan 23, 2024

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@oli-obk
Copy link
Contributor

oli-obk commented Jan 23, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 23, 2024
@bors
Copy link
Contributor

bors commented Jan 23, 2024

⌛ Trying commit 5617d16 with merge f893d88...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 23, 2024
…tchs, r=<try>

Replace the default branch with an unreachable branch If it is the last variant

Fixes rust-lang#119520.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

- Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately.
- Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8
- The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870).
- The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK.

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler
@bors
Copy link
Contributor

bors commented Jan 23, 2024

☀️ Try build successful - checks-actions
Build commit: f893d88 (f893d886617ac224771fc6bbfd026e43d860599d)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f893d88): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.3%, -0.2%] 12
Improvements ✅
(secondary)
-0.3% [-0.4%, -0.3%] 15
All ❌✅ (primary) -0.2% [-0.3%, -0.2%] 12

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.2% [2.2%, 2.2%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-5.2% [-10.4%, -1.4%] 5
Improvements ✅
(secondary)
-4.0% [-4.0%, -4.0%] 1
All ❌✅ (primary) -4.0% [-10.4%, 2.2%] 6

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.4% [-3.4%, -3.4%] 1
All ❌✅ (primary) - - 0

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.5%] 13
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.2% [-0.4%, -0.1%] 5
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.4%, 0.5%] 18

Bootstrap: 661.891s -> 662.949s (0.16%)
Artifact size: 308.33 MiB -> 308.27 MiB (-0.02%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 23, 2024
Test4::C => "C",
_ => "D",
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still work if Test4 holds a generic type instead of an i32? Should it be made so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this will be split into multiple switchInt. So I expect generic type to get the same result. But the test case is not happening, so I should be missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added support for generic type.

tests/mir-opt/uninhabited_enum_branching.rs Outdated Show resolved Hide resolved
tests/mir-opt/uninhabited_enum_branching.rs Show resolved Hide resolved
@DianQK DianQK force-pushed the otherwise_is_last_variant_switchs branch from 5617d16 to 5a398d3 Compare January 24, 2024 00:30
@rust-log-analyzer

This comment has been minimized.

@rustbot rustbot added the T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) label Jan 24, 2024
@DianQK
Copy link
Member Author

DianQK commented Jan 24, 2024

Based on the discussion in zulipchat, I have changed src/bootstrap/src/core/build_steps/test.rs.

@rust-log-analyzer

This comment has been minimized.

@DianQK DianQK force-pushed the otherwise_is_last_variant_switchs branch from af2420c to d02299c Compare January 24, 2024 13:23
@oli-obk
Copy link
Contributor

oli-obk commented Jan 24, 2024

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Jan 24, 2024

📌 Commit d02299c has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 24, 2024
@DianQK
Copy link
Member Author

DianQK commented Jan 25, 2024

Hmm, I just remembered a possible regression issue, but I don't think that should affect merging this PR. Because we're on the road to a better outcome.

#![crate_type = "lib"]

pub enum Bar {
    Foo = 1,
    Bar = 2,
    Baz = 3
}

#[no_mangle]
pub fn lookup(v: Bar) -> i32 {
    match v {
        Bar::Foo => 8,
        Bar::Bar => 9,
        Bar::Baz => 3,
    }
}

#[no_mangle]
pub fn compare(v: Bar) -> i32 {
    match v {
        Bar::Foo => 8,
        Bar::Bar => 9,
        _ => 3,
    }
}

#[no_mangle]
pub fn lookup2(v: Bar) -> i32 {
    match v {
        Bar::Foo => 1,
        Bar::Bar => 2,
        Bar::Baz => 3,
    }
}

#[no_mangle]
pub fn compare2(v: Bar) -> i32 {
    match v {
        Bar::Foo => 1,
        Bar::Bar => 2,
        _ => 3,
    }
}

GodBolt: https://rust.godbolt.org/z/55dq6zbjd

The lookup function will create a lookup table with a load instruction. I'm not sure which is better, compared to the compare. See SimplifyCFG.cpp#L6410.
But most of the scenarios should get better as we provide new unreachable facts, such as lookup2 and compare2.

@bors
Copy link
Contributor

bors commented Jan 26, 2024

⌛ Testing commit d02299c with merge 4b854f3...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 26, 2024
…tchs, r=oli-obk

Replace the default branch with an unreachable branch If it is the last variant

Fixes rust-lang#119520.

LLVM currently has limited ability to eliminate dead branches in switches, even with the patch of llvm/llvm-project#73446.

The main reasons are as follows:

- Additional costs are required to calculate the range of values, and there exist many scenarios that cannot be analyzed accurately.
- Matching values by bitwise calculation cannot handle odd branches, nor can it handle values like `-1, 0, 1`. See [SimplifyCFG.cpp#L5424](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L5424) and https://llvm.godbolt.org/z/qYMqhvMa8
- The current range information is continuous, even if the metadata for the range is submitted. See [ConstantRange.cpp#L1869-L1870](https://github.com/llvm/llvm-project/blob/llvmorg-17.0.6/llvm/lib/IR/ConstantRange.cpp#L1869-L1870).
- The metadata of the range may be lost in passes such as SROA. See https://rust.godbolt.org/z/e7f87vKMK.

Although we can make improvements, I think it would be more appropriate to put this issue to rustc first. After all, we can easily know the possible values.

Note that we've currently found a slow compilation problem in the presence of unreachable branches. See
llvm/llvm-project#78578.

r? compiler
@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jan 26, 2024

💔 Test failed - checks-actions

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 8, 2024
@oli-obk
Copy link
Contributor

oli-obk commented Mar 8, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Mar 8, 2024

📌 Commit 2884230 has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 8, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no uninhabited enum anywhere in this test... how does the test filename make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case came directly from the issue it fixed. It will call partial_cmp, so it's essentially the same as #119520 (comment) . I think it makes sense to add the test code in the issue, maybe I should create two test cases.

Copy link
Member

@RalfJung RalfJung Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But again there's no uninhabited enums anywhere that I can see, so (a) what does the test content have to do with the filename, and (b) what does it have to do with this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code a bit more, I think the MIR pass (and associated test) are misnamed. This is no longer just about uninhabited variants, it is now also about exploiting that Discriminant will never return something that isn't a variant index. I am a bit surprised that this is done as a MIR transform rather than during MIR building but the MIR transform is correct according to our current understanding of MIR semantics. Just the name is misleading after this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) I can change the file name to issue-119520.rs.
(b) https://rust.godbolt.org/z/za5c5hzoY When I reduce the issue's case, I found out that it uses Ordering after inlining. It implies the enum. I also hope that this test case will not lose optimization due to other changes in the future.

Copy link
Member Author

@DianQK DianQK Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I'm considering updating the name. (It’s just that I didn’t think of a suitable name.)
Maybe I can change it to UnreachableEnumBranching.

I am a bit surprised that this is done as a MIR transform rather than during MIR building but the MIR transform is correct according to our current understanding of MIR semantics. Just the name is misleading after this PR.

It better for me to have MIR building match the structure of the code itself where possible. (This purpose may not matter either?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It better for me to have MIR building match the structure of the code itself where possible. (This purpose may not matter either?)

Ah, I may have misunderstood where this optimization kicks in. I thought even this would just use fallback for the last variant:

    match c {
        Less => -5,
        Equal => 0,
        Greater => 42,
    }

But already on stable that becomes switchInt(move _2) -> [255: bb3, 0: bb4, 1: bb1, otherwise: bb2].

I can change the file name to issue-119520.rs.

Once we have a better name for the pass, it can use that name. (Though it would also be good to mention the issue either in the file name or file contents. It's always good to add more cross-references and those are otherwise much harder to reconstruct in the future.)

Maybe I can change it to UnreachableEnumBranching.

I like it. :) The module-level comment in that file should then explain the two ways that "unreachable" is determined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I may have misunderstood where this optimization kicks in. I thought even this would just use fallback for the last variant:

    match c {
        Less => -5,
        Equal => 0,
        Greater => 42,
    }

But already on stable that becomes switchInt(move _2) -> [255: bb3, 0: bb4, 1: bb1, otherwise: bb2].

This is something UninhabitedEnumBranching has already done.

This PR transforms following codes

    match c {
        Less => -5,
        Equal => 0,
        _ => 42,
    }

to

    match c {
        Less => -5,
        Equal => 0,
        Greater => 42,
    }

.

&& allowed_variants.len() == 1
&& check_successors(&body.basic_blocks, targets.otherwise());
let replace_otherwise_to_unreachable = otherwise_is_last_variant
|| !otherwise_is_empty_unreachable && allowed_variants.is_empty();
Copy link
Member

@RalfJung RalfJung Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a few more comments explaining what happens here -- why all these checks are needed and why they are combined in exactly the way they are. Imagine someone reading this code in a year without knowing about this PR -- what would they have to know to make sense of all this? For instance, what is check_successors even checking?

Also, a || b && c could use parentheses, the precedence is currently unclear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, testing has started, r- or I'll add a PR subsequently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subsequent PR is fine.

@bors
Copy link
Contributor

bors commented Mar 8, 2024

⌛ Testing commit 2884230 with merge 14fbc3c...

@bors
Copy link
Contributor

bors commented Mar 8, 2024

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 14fbc3c to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 8, 2024
@bors bors merged commit 14fbc3c into rust-lang:master Mar 8, 2024
12 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Mar 8, 2024
@DianQK DianQK deleted the otherwise_is_last_variant_switchs branch March 8, 2024 09:36
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (14fbc3c): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.7% [0.2%, 1.8%] 4
Regressions ❌
(secondary)
0.2% [0.2%, 0.3%] 7
Improvements ✅
(primary)
-0.8% [-1.2%, -0.3%] 5
Improvements ✅
(secondary)
-0.9% [-2.2%, -0.3%] 3
All ❌✅ (primary) -0.1% [-1.2%, 1.8%] 9

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.9% [0.2%, 8.8%] 4
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.6% [-6.3%, -2.2%] 3
Improvements ✅
(secondary)
-5.4% [-5.4%, -5.4%] 1
All ❌✅ (primary) 0.7% [-6.3%, 8.8%] 7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.0% [2.0%, 2.0%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.0% [2.0%, 2.0%] 1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.2%] 11
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 7
Improvements ✅
(primary)
-0.2% [-1.0%, -0.0%] 35
Improvements ✅
(secondary)
-0.1% [-1.6%, -0.0%] 15
All ❌✅ (primary) -0.1% [-1.0%, 0.2%] 46

Bootstrap: 646.506s -> 648.483s (0.31%)
Artifact size: 172.55 MiB -> 172.46 MiB (-0.05%)

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 9, 2024
 Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching`

Per [rust-lang#120268](rust-lang#120268 (comment)), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` .

I solved some nits to add some comments.

I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead.

r? RalfJung
@pnkfelix
Copy link
Member

pnkfelix commented Mar 12, 2024

Visiting for weekly performance triage.

@DianQK the 1.8% regression to cargo opt-full is concerning to me. But from looking at the early rust-timer invocations, I saw it come up only once, not every time. So its not clear to me how much it was anticipated in your developments here.

Do you have any idea where the cargo opt-full regression is arising? Is it somehow connected to llvm/llvm-project#78578 ?

(not marking as triaged, not yet.)

DianQK added a commit to DianQK/rust that referenced this pull request Mar 13, 2024
…iant_switchs, r=oli-obk"

This reverts commit 14fbc3c, reversing
changes made to 9fb91aa.
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 13, 2024
@DianQK
Copy link
Member Author

DianQK commented Mar 13, 2024

I don't think this is a regression. This is the result of previous try:
#120268 (comment). I don't see cargo related regressions.
I made another attempt by revert this PR: #122414. I can only see cranelift-codegen restoring the previous result.

Or maybe I'm missing something.
Usually, I think adding a fact may provide more opportunities for optimization. We need more work to optimize. This may be a desired result.

@DianQK
Copy link
Member Author

DianQK commented Mar 13, 2024

Is it somehow connected to llvm/llvm-project#78578 ?

I don't think it's relevant. I should see the output size increase because early-tailduplication will duplicate instructions. But the actual result is -0.38%.

@pnkfelix
Copy link
Member

pnkfelix commented Mar 14, 2024

We discussed the cargo opt-full result a little at T-compiler meeting today (zulip)

We decided that this does seem, from the chart, that it was subsequently resolved (potentially by PR #120985).

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Mar 14, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 3, 2024
 Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching`

Per [rust-lang#120268](rust-lang#120268 (comment)), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` .

I solved some nits to add some comments.

I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead.

r? RalfJung
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Apr 4, 2024
 Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching`

Per [#120268](rust-lang/rust#120268 (comment)), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` .

I solved some nits to add some comments.

I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead.

r? RalfJung
lnicola pushed a commit to lnicola/rust-analyzer that referenced this pull request Apr 7, 2024
 Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching`

Per [#120268](rust-lang/rust#120268 (comment)), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` .

I solved some nits to add some comments.

I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead.

r? RalfJung
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 27, 2024
 Rename `UninhabitedEnumBranching` to `UnreachableEnumBranching`

Per [#120268](rust-lang/rust#120268 (comment)), I rename `UninhabitedEnumBranching` to `UnreachableEnumBranching` .

I solved some nits to add some comments.

I adjusted the workaround restrictions. This should be useful for `a <= b` and `if let Some/Ok(v)`. For enum with few variants, `early-tailduplication` should not cause compile time overhead.

r? RalfJung
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet