-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DebugInfo@O2] Too much indirection added to stack pointer's variable location #41020
Comments
assigned to @jmorse |
Does DebugLocEntry::Value distinguish between memory vs. register location descriptions? I see E_Location.. that wraps a MachineLocation.. that has an IsRegister bit :).
So if Op1 is an immediate, it's a register location. I guess we get confused here because Op1 is $noreg. |
Sorry -- I had this backwards. If Op1 is an immediate, it's not a register location. So Op1 = $noreg would be a memory location. |
Still wrong -- DBG_VALUE $somereg, $noreg makes for a register location. I think this means that the extra indirection bug occurs after the DebugLocEntry is created. |
Vedant wrote:
Indeed -- I think our definition of what's a register location and what is not, in a DBG_VALUE, is conflicting with the dwarf backend. My reading of the dwarf expression producer is that it'll swallow any register location temporarily, and then if there are any further opcodes in DIExpression it'll emit a memory reference. The only call to addReg (for register locations) in the file is guarded by !HasComplexExpression [0], thus it becomes addBReg if there's any complex expression. That then means if a DBG_VALUEs Op1 = imm then a memory reference is forced; but if Op1 = $noreg then it can be emitted as either a register or memory reference, depending on what's in the DIExpression. We could force Op1=$noreg to produce a register reference, however looking at section 2.6.1.1.2 of the DWARF4 spec, my interpretation is that it's not permissible to produce an expression with DW_OP_reg* and any other operation. If so, that would then mean the only way to produce the outcome we desire would be through DW_OP_stack_value. And if so, where should that be added? (I'm not a DWARF expert so this is largely guesswork, I'll go find some DWARF experts though...) [0]
|
The first question is, what does the DBG_VALUE describe? Is it describing a place in the computer that contains the variable's value? Is it describing a case where the value per se does not exist, but we know If you can't tell, based on looking at the DBG_VALUE, whether it is If DBG_VALUE describes a place containing the value, and that place happens |
More experimentation with the initial fix I suggested shows that the test failures are not significant: out of six, all are because we have a spurious pair of DW_OP_deref & DW_OP_stack_value. That makes two tests fail because stack_value isn't in DWARF 2, and textual output has a minor change in the other four. It looks like DwarfExpression often swallows [0] trailing DW_OP_derefs, this patch causes it to instead emit the deref/stack_value pair, which is effectively a no-op. It seems pretty dumb to emit things like that, so I'll dig further. Paul wrote:
It doesn't help that a DIExpression is sometimes a memory reference and sometimes not, and working out which may involve interpreting the expression. I'll try and cook up some counterexamples to show whether it's properly broken or not. |
A deref/stack_value pair isn't exactly a no-op; it makes the value read-only, |
Examining the interaction between the "Indirect" flag in a DBG_VALUE and the presence of DW_OP_deref and the like, I found the behaviours shown below. I took a random test, compiled it to the last llc pass (LLVM r361045), then fiddled with a DBG_VALUE on $rsp to see what output location was picked by the DWARF code. On the left hand side, the "Indirect" flag and DIExpression of the DBG_VALUE, on the right hand side, the corresponding DWARF location: $noreg, (plus_uconst, 8), -> DW_OP_breg7 RSP+8) I've cut DW_OP_ from a few constants to make that all fit. It seems the "indirect" flag sometimes has an effect, and sometimes does not, which has the potential for confusion. At the end of the day, there should only be one way of encoding the same information (I believe this is an accepted proverb). It's also unfortunate (IMHO) that the DwarfExpression class I'll now take a look at making the link between "indirect" and the DIExpression more explicit. AFAIUI the only place we really use it to make a decision is when adding extra layers of indirection to spilt locations. |
NB: after some prodding, I'm of the opinion that the "isIndirect" flag of a DBG_VALUE is technically redundant. As far as I'm aware, the purpose of the flag is so that we can easily refer to the value of a stack slot when a spill occurs. But now that dbg.addr intrinsics can generate DBG_VALUE insts that are already indirect, we have to cope with multiple levels of indirection [0], so little has been gained. (Assuming my assumption about its purpose is correct). That leaves us with the isIndirect flag adding confusion but no additional functionality (IMHO). One can express the same location in different ways with isIndirect, for example: DBG_VALUE $reg, $noreg, !foo, !DIExpression(DW_OP_deref) and as shown with comment 8, sometimes the addition of isIndirect doesn't actually make a difference to the location. Soooooo, I've taken a shot at getting rid of isIndirect -- first by never generating any indirect DBG_VALUEs (and instead appending DW_OP_deref to the relevant expressions), maybe later by eliminating the operand entirely. I haven't yet managed to generate an identical copy of clang-3.4, which I believe is because I don't fully understand what the DW_OP_deref_size work in PrologEpilogInserter.cpp does. All the test failures are minor changes in representation and bug 41992, nothing serious. I imagine I'll have to mail llvm-dev at some point to check that there isn't some part of the design I've missed; this is more to record how I currently view this bug. [0]
|
Not sure when, exactly, isIndirect ended up on DBG_VALUE, maybe it was there before
I /think/ I'm probably to blame for isindirect - though it's a twisty path and it might be an unrelated (or at least different, but related) case of isindirect that you're talking about here. I originally added an "isindirect" flag on DIVariable - back when we didn't have DIExpression, etc. (see r184365, r184367, and r184368) This was to describe indirect pass by value parameters in C++. Initially (see #15135 for pointers) indirect pass by value parameters had had their type modified in clang so the DWARF would record them as reference parameters, not value parameters - but that wasn't the correct way to represent them (meant that calling the functions in the debugger did the wrong thing (not copying the parameter), among other problems). So I fixed that bug in the frontend in r183329, but that messed up the location descriptions that I then fixed with the addition of the isindirect flag. I think the isindirect flag then ended up being propagated into the early versions of the DIExpression work when that new support may not've been up to the task of describing the indirection in a more general way. I think now DIExpression & co can probably describe this as just another OP_deref or whatever & there's probably no need for special handling. |
Candidate patch for the too-much-indirection issue: https://reviews.llvm.org/D63429 David wrote:
Aha, thanks for the history, that does sound like the source of the isIndirect operand. It seems unlikely that it can be doing anything that can't be done with correct DIExpressions, but I'll mail llvm-dev anyway to avoid anyone getting unexpected surprises, then try to kill it off shortly. |
Fixed in https://reviews.llvm.org/rL364736 ; killing off the isIndirect flag is a non-urgent improvement for the future IMO, I'll likely prod that after llvm-9 branches. |
Extended Description
Spinning out of bug 41534, where Yuanbo has the following code:
--------8<--------
$ cat abc.c
int a;
int *b;
int main() {
int l_1081 = 1834104526;
int *c = &l_1081;
b = c;
*b = a = 9;
optimize_me_not();
}
$ cat outer.c
optimize_me_not() {}
-------->8--------
Compiling with clang/llvm r358073 and options "-O2 -g", printing the "c" variable on any line of the program yields a pointer that doesn't point at the stack. Examining the location:
--------8<--------
DW_AT_location (0x00000000
[0x0000000000400486, 0x00000000004004a9): DW_OP_breg7 RSP+4)
DW_AT_name ("c")
-------->8--------
Shows that the location of 'c' has grown one too many levels of indirection: the variable should be the value of RSP+4 (i.e., a pointer at the stack), not the memory data that the expression points at.
Examining the debug intrinsics for the "c" variable as they change over time, before isel we have:
dbg.value(metadata i32* %l_1081, metadata !19, metadata !DIExpression())
Between isel and prologepilog, where frame indexes are finalised, we get:
And from prologepilog to object emission we get:
DBG_VALUE $rsp, $noreg, !19, !DIExpression(DW_OP_plus_uconst, 4), [...]
My current belief is that switching the DIExpression from being empty to non-empty also implicitly transforms the DBG_VALUE into being a memory location description rather than a register location description. However, I'm unfamiliar with how the DWARF backend treats these things, I'm about 70% confident.
Adding a DW_OP_stack_value to non-indirect DBG_VALUEs when frameindexes are rewritten fixes this example but makes a bunch of tests break, in seemly significant ways. I haven't dug any further, it might be that DW_OP_stack_value is only needed when the DIExpression is initially empty?
The text was updated successfully, but these errors were encountered: