-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot represent a difference across sections due to debuginfo #61932
Comments
@llvm/issue-subscribers-debuginfo |
Bisected to d20e4a1. cc @DianQK @dwblaikie |
It looks like test2 being "cold" controls this -- looking at the textual assembly, .debug_ranges is being generated thus:
Where the temporary symbols are in the test1 function, and .Lfunc_begin1 is the start of test2. test2 ends up in a different section to test1, so this is an illegal cross-section reference. Exactly why this happens is pretty unclear though: if I move test2 to the start of the input file, .debug_ranges still tries to describe test1 in terms of the end of test2, so there's something in the debug-info metadata that causes the two to be connected. test1's scope appears to be deeply buried in some types that are used for the global variable in !8, which is in CU !4's retained_nodes list. However, test1 says that it belongs to CU !1. Breaking this connection at various points in the type-scope chain, or test1's CU, makes the error go away. This gives me the impression that this is a debug-info deduplication / linking error? Exactly how that translates to test1's .debug_ranges connecting to another function still isn't clear. (The type hierarchy metadata isn't my strong point). |
It may be time to apply https://reviews.llvm.org/P8297 to solve various such problems. @dwblaikie |
@nikic @jmorse I probably figured out why the debug info is often inconsistent under LTO here.
The core issue I found is that DISubprogram operations usually use getUnit, while its DIE uses getScope. Once these two are associated with different CUs, associated DIEs may use another CU. If we can make the associated CU for each DIE unambiguous, these kinds of problems should all go away. But what are the rules for determining who the parent DIE of a DIE is? Ignoring the unit and following the scope chain might be the way to go. I have created a patch that is just working. |
It looks like there is no consensus on the proposed patch (https://reviews.llvm.org/D147620). @dwblaikie is arguing that the debuginfo should not be considered valid in the first place. It sounds like for LLVM 17, a new debuginfo verifier check needs to be implemented and Rust's debuginfo generation adjusted to comply with the new check. However, this regression still needs to be fixed for LLVM 16, in a way that works with the existing debuginfo format. This is a pretty critical issue, because it breaks building some Rust projects (including Firefox) with debuginfo and LTO. As such, I'd like to propose that we just revert the original change for LLVM 16 and then work on better debuginfo verification (and fixing the fallout it entails) for LLVM 17. /branch nikic/llvm-project/revert-pr61932 |
/pull-request llvm/llvm-project-release-prs#438 |
What's the behavior before the regression (which was a bug fix for incorrect ranges base addresses, I think?), though? I think it was probably also broken - though perhaps silently so (& maybe that's acceptable, if unfortunate)? |
@dwblaikie I'm not sure about the rest of it, but at least debug_ranges used to be well-formed. Here's the diff between LLVM 15 and 16: https://llvm.godbolt.org/z/1EsKP73Y8 Though for now my primary concern is indeed just that debuginfo generation doesn't break compilation outright -- if the generated DWARF is inaccurate in some way that would be a longer term concern. |
@DianQK - do you have a sense of/diff of the effect on the example in this bug, and in the original patch, of reverting your patch? (the patch suggests it fixes some base address issues) Is it the case that reverting this patch doesn't get back to LLVM 15 behavior? Rejecting the LLVM metadata in the verifier would cause it to drop all the debug info that contains a verification failure - this'd be about the same as building the Rust code without debug info. |
Reverting the patch is fine for me. This patch can cause Rust to fail to generate object files in some cases. Also, I realized that switching CU in such a depth of logic is not a good way to go. This patch cannot handle base address adjustment for all shared DIE cases.
We can continue to discuss a better solution in D147620. In fact I have reverted D136039 in D147620 (D58538 also). I added a new comment to D147620, and the discovery at 73d697c makes me think that the new patch is a more suitable solution. |
rust-lang/rust#111167 fixes the debuginfo generation on the rustc side, in line with @dwblaikie's recommendation. Assuming that gets approved for backport there will be no need to do anything for LLVM 16. We should still implement a verifier check on the LLVM side. There is a suspicion that this has also caused a number of other issues that were never fully understood, so we'll want to save other people the trouble :) |
thumbs up
Yeah... would be good. @DianQK any chance you'd be up for trying that? (a verifier check that checks inlined subroutines and definition subprograms and checks that they refer to a declaration, never directly to their enclosing class?) |
I can try it out in my free time. This verifier check should be helpful. This issue has existed for a long time in Rust and Swift. |
Hmm do you have examples of the problem in Swift? Are they going to change to add function declarations like Rust has changed now? (@adrian-prantl @JDevlieghere ) |
I have submitted a validation patch: https://reviews.llvm.org/D152095. |
…Type when enabling ODR. Resolve llvm#61932. We should add the validation. LLVM can't handle IR where subprogram definitions are nested within DICompositeType when doing LTO builds, because there's no good way to cross the CU boundary to insert a nested DISubprogram definition in one CU into a type defined in another CU. The test cases `cross-cu-inlining-2.ll` and `cross-cu-inlining-ranges.ll` can be deleted. In the `cross-cu-inlining-2.ll`, the low pc and high pc of the CU are also incorrect. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D152095
Running https://gist.github.com/nikic/6f60958ed4295f0bcd48797b5e541aba through
llc -filetype=obj
we get errors like this:This is related to the debug info in some way.
The text was updated successfully, but these errors were encountered: