-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is the code size of catch_unwind so large ? #64224
Comments
I think the first example with try/catch is a bit misleading. One notable difference between try/catch and std::panic::catch_unwind is that the first provides an option to not actually capture the exception (like your code illustrates). That is something that Rust today cannot do -- which is quite unfortunate, as that is a nonzero codesize burden due to needing to deallocate the Box. If we forget it explicitly instead, then we get somewhat nicer assembly: https://gcc.godbolt.org/z/vs6URt. I'm not sure it matters too much in practice as I'd guess that catch_unwind which ignores the exception itself is somewhat rare, though it might be worth trying to figure out a way that we can thread that information through (it would probably need a second catch_unwind function -- otherwise, I don't think we can thread the information through into the function in anyway, just due to its return type). Separately, C++ also can get away without the separate I'm not sure it's worth optimizing this, though, as catch_unwind is much more rarely called than try/catch in C++, I expect. I have opened a PR (#67502) that slightly optimizes the ABI of __rust_maybe_catch_panic which reduces the stack space utilization and simplifies the code a little; this likely has essentially no runtime impact though. |
Good point. Would there be a way to insert the
When would this function not be unconditionally called with the current codegen ? (i.e. is this an LLVM bug, in which LLVM is not recognizing that this function is unconditionally called?). |
Well, the forget is a memory leak, so I doubt you'd want to auto insert it. Unfortunately I don't think we can really blame llvm here - we're indirecting through an extern C function so llvm basically can't inline anything here (which would be needed I imagine to see that the function is always called). Furthermore it's not even the first thing to be called in the non-aborting case. We do likely have a way out - move the whole catch function to an intrinsic. Then we'd likely get much better results, particularly with -Cpanic=abort, and it wouldn't even be all that hard I imagine to do this. I continue to be unsure that it's worth it. I do recall that Servo mentioned that catch unwind used to be a performance problem for them; I'm not sure if it still is. Maybe @Manishearth knows, or can find out? |
I don't really recall much about that. |
I have no data showing that catch_unwind is currently a performance problem, if that's the question being asked. |
The problem being reported here is that Rust's |
Could we get better codegen if we let |
|
Here's what I got by inlining example::bar:
push r14
push rbx
sub rsp, 24
mov ebx, 42
call qword ptr [rip + foo@GOTPCREL]
.LBB2_6:
mov eax, ebx
add rsp, 24
pop rbx
pop r14
ret
mov rbx, qword ptr [rax + 256]
mov r14, qword ptr [rax + 264]
mov qword ptr [rax + 256], 0
mov rdi, rax
call qword ptr [rip + _Unwind_DeleteException@GOTPCREL]
mov qword ptr [rsp + 8], rbx
mov qword ptr [rsp + 16], r14
test rbx, rbx
je .LBB2_2
mov edi, -1
call qword ptr [rip + update_panic_count@GOTPCREL]
mov ebx, 13
jmp .LBB2_6
.LBB2_2:
lea rdi, [rip + .L__unnamed_1]
lea rdx, [rip + .L__unnamed_2]
mov esi, 43
call qword ptr [rip + core::panicking::panic@GOTPCREL]
ud2
mov rbx, rax
lea rdi, [rsp + 8]
call core::ptr::real_drop_in_place
mov rdi, rbx
call _Unwind_Resume@PLT
ud2 You'll note that the fast path is now almost as efficient as the C++ one (we're just allocating more stack space), and I'm sure we can reduce the slow path by forcing it out to a separate function. I think we should make |
The inlining (and some further work) worked out to produce a PR that @Amanieu and I independently came up with, though we've decided to utilize my previous PR as a base (#67502). That PR is marked as resolving this issue, as well as #64222. Interested parties can take a look at the implementation there which we're still iterating on. |
@Mark-Simulacrum that sounds great. I'm not sure how this issue gets resolved, could you post the codegen of the OP example w/o the patch ? Does that PR produce code that's sufficiently close to what C++ produces ? |
The PR description had a godbolt link and includes two sets of assembly with the same code as from the godbolt link. |
So how does it do for this snippet? #![feature(unwind_attributes)]
extern "C" {
// can unwind:
#[unwind(allow)] fn foo();
}
pub unsafe fn bar() -> i32 {
std::panic::catch_unwind(|| { foo(); 42 }).unwrap_or(13)
} ? |
I don't really have the time to try out lots of snippets; I expect that to be on par, but slightly worse -- the lack of a mem::forget will mean that you have a destructor for the |
I'm just trying to understand how that PR closes this issue, which is not straightforward from the contents of that PR (I understand how it closes the other issue though). |
That PR makes the code size as small as we can, to my knowledge; I've kicked off a try build so that it's easier to test that PR - if there's still suboptimal examples I'd be happy to look at fixing them. I personally suspect that the remainder of the suboptimal examples cannot be fixed without a new, separate function which takes a separate closure or does not capture the exception at all - one that at least conceptually could go in libcore (aside from being platform specific implementation wise). |
@gnzlbg Here's the asm output with that PR: _ZN4test3bar17h0bd71e8ae5322970E:
push r15
push r14
push rbx
mov ebx, 42
call qword ptr [rip + foo@GOTPCREL]
.LBB1_4:
mov eax, ebx
pop rbx
pop r14
pop r15
ret
.LBB1_1:
mov rdi, rax
call qword ptr [rip + _ZN3std9panicking3try7cleanup17hee23cc19e10d1537E@GOTPCREL]
mov r14, rax
mov r15, rdx
mov rdi, rax
call qword ptr [rdx]
mov rsi, qword ptr [r15 + 8]
mov ebx, 13
test rsi, rsi
je .LBB1_4
mov rdx, qword ptr [r15 + 16]
mov rdi, r14
call qword ptr [rip + __rust_dealloc@GOTPCREL]
jmp .LBB1_4
.LBB1_5:
mov rbx, rax
mov rdi, r14
mov rsi, r15
call _ZN5alloc5alloc8box_free17h849e57dccc1d906aE
mov rdi, rbx
call _Unwind_Resume@PLT
ud2 Notes:
|
We're still not reaching code-size parity with C++ for 2 reasons:
|
Could we maybe wrap this in a
This makes sense, I don't think there is a way to handle this any better either. |
The drop isn't controlled by us (occurs in the |
With a separate #![feature(unwind_attributes)]
extern "C" {
#[unwind(allow)]
fn foo();
}
#[cold]
fn drop_box(b: Box<dyn std::any::Any + Send>) {
drop(b);
}
pub unsafe fn bar() -> i32 {
std::panic::catch_unwind(|| {
foo();
42
})
.unwrap_or_else(|e| {
drop_box(e);
13
})
} _ZN4test8drop_box17hee2c4bc13b7e934bE:
push r15
push r14
push rbx
mov rbx, rsi
mov r14, rdi
call qword ptr [rsi]
mov rsi, qword ptr [rbx + 8]
test rsi, rsi
je .LBB1_4
mov rdx, qword ptr [rbx + 16]
mov rdi, r14
pop rbx
pop r14
pop r15
jmp qword ptr [rip + __rust_dealloc@GOTPCREL]
.LBB1_4:
pop rbx
pop r14
pop r15
ret
.LBB1_3:
mov r15, rax
mov rdi, r14
mov rsi, rbx
call _ZN5alloc5alloc8box_free17h849e57dccc1d906aE
mov rdi, r15
call _Unwind_Resume@PLT
ud2
_ZN4test3bar17h0bd71e8ae5322970E:
push rbx
mov ebx, 42
call qword ptr [rip + foo@GOTPCREL]
.LBB2_2:
mov eax, ebx
pop rbx
ret
.LBB2_1:
mov rdi, rax
call qword ptr [rip + _ZN3std9panicking3try7cleanup17hee23cc19e10d1537E@GOTPCREL]
mov rdi, rax
mov rsi, rdx
call _ZN4test8drop_box17hee2c4bc13b7e934bE
mov ebx, 13
jmp .LBB2_2
|
Thanks @Amanieu, that's exactly what I had in mind. I don't see a simple way of doing this by default without making Maaaybe we could provide a specialized impl in default impl<T> Drop for Box<T> { ... }
impl Drop for Box<dyn Any + Send> {
#[cold] fn drop(&mut self) { ... }
} but I'm not sure whether that will have the desired impact, and also whether that would be worth doing even if it did. It would not only impact An intrinsic for |
An intrinsic for catch_unwind wouldn't help I think? Or at least I don't see exactly what you mean by that. I think what could help is |
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
Optimize catch_unwind to match C++ try/catch This refactors the implementation of catching unwinds to allow LLVM to inline the "try" closure directly into the happy path, avoiding indirection. This means that the catch_unwind implementation is (after this PR) zero-cost unless a panic is thrown. https://rust.godbolt.org/z/cZcUSB is an example of the current codegen in a simple case. Notably, the codegen is *exactly the same* if `-Cpanic=abort` is passed, which is clearly not great. This PR, on the other hand, generates the following assembly: ```asm # -Cpanic=unwind: push rbx mov ebx,0x2a call QWORD PTR [rip+0x1c53c] # <happy> mov eax,ebx pop rbx ret mov rdi,rax call QWORD PTR [rip+0x1c537] # cleanup function call call QWORD PTR [rip+0x1c539] # <unfortunate> mov ebx,0xd mov eax,ebx pop rbx ret # -Cpanic=abort: push rax call QWORD PTR [rip+0x20a1] # <happy> mov eax,0x2a pop rcx ret ``` Fixes #64224, and resolves #64222.
While filling #64222 I noticed that we generate more code than C++ for
catch_unwind
. That did not feel right, since C++'scatch
can do much more than Rust's catch unwind, e.g., filtering different types of exceptions, etc.MWE: C++ (https://gcc.godbolt.org/z/z_dgPg):
generates
while Rust (https://gcc.godbolt.org/z/4sbc6k):
generates
This appears to be a constant overhead every time
catch_unwind
is used (e.g. see https://gcc.godbolt.org/z/bAvN24). Maybe we are inlining too much ?The text was updated successfully, but these errors were encountered: