Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DebugInfo@O2] Too much indirection added to stack pointer's variable location #41020

Closed
jmorse opened this issue Apr 30, 2019 · 13 comments
Closed
Assignees
Labels
bugzilla Issues migrated from bugzilla llvm:codegen wrong-debug

Comments

@jmorse
Copy link
Member

jmorse commented Apr 30, 2019

Bugzilla Link 41675
Resolution FIXED
Resolved on Jul 01, 2019 04:56
Version trunk
OS Linux
Blocks #38116
CC @adrian-prantl,@dwblaikie,@gregbedwell,@jdm,@OCHyams,@pogo59,@pogo59,@Melamoto,@vedantk
Fixed by commit(s) r364736

Extended Description

Spinning out of bug 41534, where Yuanbo has the following code:

--------8<--------
$ cat abc.c
int a;
int *b;
int main() {
int l_1081 = 1834104526;
int *c = &l_1081;
b = c;
*b = a = 9;
optimize_me_not();
}

$ cat outer.c
optimize_me_not() {}
-------->8--------

Compiling with clang/llvm r358073 and options "-O2 -g", printing the "c" variable on any line of the program yields a pointer that doesn't point at the stack. Examining the location:

--------8<--------
DW_AT_location (0x00000000
[0x0000000000400486, 0x00000000004004a9): DW_OP_breg7 RSP+4)
DW_AT_name ("c")
-------->8--------

Shows that the location of 'c' has grown one too many levels of indirection: the variable should be the value of RSP+4 (i.e., a pointer at the stack), not the memory data that the expression points at.

Examining the debug intrinsics for the "c" variable as they change over time, before isel we have:

dbg.value(metadata i32* %l_1081, metadata !​19, metadata !DIExpression())

Between isel and prologepilog, where frame indexes are finalised, we get:

DBG_VALUE %stack.0.l_1081, $noreg, !&#8203;19, !DIExpression(), [...]

And from prologepilog to object emission we get:

DBG_VALUE $rsp, $noreg, !​19, !DIExpression(DW_OP_plus_uconst, 4), [...]

My current belief is that switching the DIExpression from being empty to non-empty also implicitly transforms the DBG_VALUE into being a memory location description rather than a register location description. However, I'm unfamiliar with how the DWARF backend treats these things, I'm about 70% confident.

Adding a DW_OP_stack_value to non-indirect DBG_VALUEs when frameindexes are rewritten fixes this example but makes a bunch of tests break, in seemly significant ways. I haven't dug any further, it might be that DW_OP_stack_value is only needed when the DIExpression is initially empty?

@jmorse
Copy link
Member Author

jmorse commented Apr 30, 2019

assigned to @jmorse

@vedantk
Copy link
Collaborator

vedantk commented Apr 30, 2019

Does DebugLocEntry::Value distinguish between memory vs. register location descriptions? I see E_Location.. that wraps a MachineLocation.. that has an IsRegister bit :).

  explicit MachineLocation(unsigned R, bool Indirect = false)
      : IsRegister(!Indirect), Register(R) {}

// Get .debug_loc entry for the instruction range starting at MI.
static DebugLocEntry::Value getDebugLocValue(const MachineInstr *MI) {
  const DIExpression *Expr = MI->getDebugExpression();
  assert(MI->getNumOperands() == 4);
  if (MI->getOperand(0).isReg()) {
    auto RegOp = MI->getOperand(0);
    auto Op1 = MI->getOperand(1);
    // If the second operand is an immediate, this is a
    // register-indirect address.
    assert((!Op1.isImm() || (Op1.getImm() == 0)) && "unexpected offset");
    MachineLocation MLoc(RegOp.getReg(), Op1.isImm());
    return DebugLocEntry::Value(Expr, MLoc);
  }

So if Op1 is an immediate, it's a register location. I guess we get confused here because Op1 is $noreg.

@vedantk
Copy link
Collaborator

vedantk commented Apr 30, 2019

So if Op1 is an immediate, it's a register location. I guess we get confused
here because Op1 is $noreg.

Sorry -- I had this backwards. If Op1 is an immediate, it's not a register location. So Op1 = $noreg would be a memory location.

@vedantk
Copy link
Collaborator

vedantk commented Apr 30, 2019

So if Op1 is an immediate, it's a register location. I guess we get confused
here because Op1 is $noreg.

Sorry -- I had this backwards. If Op1 is an immediate, it's not a register
location. So Op1 = $noreg would be a memory location.

Still wrong -- DBG_VALUE $somereg, $noreg makes for a register location. I think this means that the extra indirection bug occurs after the DebugLocEntry is created.

@jmorse
Copy link
Member Author

jmorse commented May 1, 2019

Vedant wrote:

Still wrong -- DBG_VALUE $somereg, $noreg makes for a register location. I
think this means that the extra indirection bug occurs after the
DebugLocEntry is created.

Indeed -- I think our definition of what's a register location and what is not, in a DBG_VALUE, is conflicting with the dwarf backend. My reading of the dwarf expression producer is that it'll swallow any register location temporarily, and then if there are any further opcodes in DIExpression it'll emit a memory reference. The only call to addReg (for register locations) in the file is guarded by !HasComplexExpression [0], thus it becomes addBReg if there's any complex expression.

That then means if a DBG_VALUEs Op1 = imm then a memory reference is forced; but if Op1 = $noreg then it can be emitted as either a register or memory reference, depending on what's in the DIExpression.

We could force Op1=$noreg to produce a register reference, however looking at section 2.6.1.1.2 of the DWARF4 spec, my interpretation is that it's not permissible to produce an expression with DW_OP_reg* and any other operation. If so, that would then mean the only way to produce the outcome we desire would be through DW_OP_stack_value. And if so, where should that be added?

(I'm not a DWARF expert so this is largely guesswork, I'll go find some DWARF experts though...)

[0]

if (LocationKind != Memory && !HasComplexExpression) {

@pogo59
Copy link
Collaborator

pogo59 commented May 1, 2019

We could force Op1=$noreg to produce a register reference, however looking
at section 2.6.1.1.2 of the DWARF4 spec, my interpretation is that it's not
permissible to produce an expression with DW_OP_reg* and any other
operation. If so, that would then mean the only way to produce the outcome
we desire would be through DW_OP_stack_value. And if so, where should that
be added?

(I'm not a DWARF expert so this is largely guesswork, I'll go find some
DWARF experts though...)

The first question is, what does the DBG_VALUE describe?

Is it describing a place in the computer that contains the variable's value?
In that case you do not want DW_OP_stack_value, you want to produce an
expression that describes the place.

Is it describing a case where the value per se does not exist, but we know
how to reconstruct it? In that case you do want DW_OP_stack_value, because
otherwise we have to tell the debugger that the variable has been optimized
away (which we do by not emitting an expression at all).

If you can't tell, based on looking at the DBG_VALUE, whether it is
describing a place containing the value or the value itself, then that is a
design flaw in how DBG_VALUE works. Sort that out first.

If DBG_VALUE describes a place containing the value, and that place happens
to be a register, then you name that place with DW_OP_reg*. If the place is
not a register, but has an address contained in a register (or derived from
it, perhaps as an offset from a base register), then you use DW_OP_breg* and
go from there.

@jmorse
Copy link
Member Author

jmorse commented May 20, 2019

More experimentation with the initial fix I suggested shows that the test failures are not significant: out of six, all are because we have a spurious pair of DW_OP_deref & DW_OP_stack_value. That makes two tests fail because stack_value isn't in DWARF 2, and textual output has a minor change in the other four.

It looks like DwarfExpression often swallows [0] trailing DW_OP_derefs, this patch causes it to instead emit the deref/stack_value pair, which is effectively a no-op. It seems pretty dumb to emit things like that, so I'll dig further.

Paul wrote:

Is it describing a place in the computer that contains the variable's value?
In that case you do not want DW_OP_stack_value, you want to produce an
expression that describes the place.
[...]
If you can't tell, based on looking at the DBG_VALUE, whether it is
describing a place containing the value or the value itself, then that is a
design flaw in how DBG_VALUE works. Sort that out first.

It doesn't help that a DIExpression is sometimes a memory reference and sometimes not, and working out which may involve interpreting the expression. I'll try and cook up some counterexamples to show whether it's properly broken or not.

[0] https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/AsmPrinter/DwarfExpression.cpp#L377

@pogo59
Copy link
Collaborator

pogo59 commented May 20, 2019

It looks like DwarfExpression often swallows [0] trailing DW_OP_derefs, this
patch causes it to instead emit the deref/stack_value pair, which is
effectively a no-op. It seems pretty dumb to emit things like that, so I'll
dig further.

A deref/stack_value pair isn't exactly a no-op; it makes the value read-only,
which might be useful in certain situations, but I think not in this case.
In any case we should not be emitting that kind of expression without the
intent of making it read-only.

@jmorse
Copy link
Member Author

jmorse commented May 20, 2019

Examining the interaction between the "Indirect" flag in a DBG_VALUE and the presence of DW_OP_deref and the like, I found the behaviours shown below. I took a random test, compiled it to the last llc pass (LLVM r361045), then fiddled with a DBG_VALUE on $rsp to see what output location was picked by the DWARF code. On the left hand side, the "Indirect" flag and DIExpression of the DBG_VALUE, on the right hand side, the corresponding DWARF location:

$noreg, (plus_uconst, 8), -> DW_OP_breg7 RSP+8)
0, (plus_uconst, 8), -> DW_OP_breg7 RSP+8)
$noreg, (plus_uconst, 8, stackval), -> DW_OP_breg7 RSP+8, stackval
0, (plus_uconst, 8, stackval), -> DW_OP_breg7 RSP+8, stackval
$noreg, (plus_uconst, 8, deref), -> DW_OP_breg7 RSP+8
0, (plus_uconst, 8, deref), -> DW_OP_breg7 RSP+8, deref
$noreg, (plus_uconst, 8, deref, stackval)-> DW_OP_breg7 RSP+8, deref, stackval
0, (plus_uconst, 8, deref, stackval)-> DW_OP_breg7 RSP+8, deref, stackval
$noreg, (), -> DW_OP_reg7 RSP
0, (), -> DW_OP_breg7 RSP+0
$noreg, (deref), -> DW_OP_breg7 RSP+0
0, (deref), -> DW_OP_breg7 RSP+0, DW_OP_deref

I've cut DW_OP_ from a few constants to make that all fit.

It seems the "indirect" flag sometimes has an effect, and sometimes does not, which has the potential for confusion. At the end of the day, there should only be one way of encoding the same information (I believe this is an accepted proverb). It's also unfortunate (IMHO) that the DwarfExpression class
guesses the type of location, rather than being told / relying on what's in the IR.

I'll now take a look at making the link between "indirect" and the DIExpression more explicit. AFAIUI the only place we really use it to make a decision is when adding extra layers of indirection to spilt locations.

@jmorse
Copy link
Member Author

jmorse commented Jun 13, 2019

NB: after some prodding, I'm of the opinion that the "isIndirect" flag of a DBG_VALUE is technically redundant. As far as I'm aware, the purpose of the flag is so that we can easily refer to the value of a stack slot when a spill occurs. But now that dbg.addr intrinsics can generate DBG_VALUE insts that are already indirect, we have to cope with multiple levels of indirection [0], so little has been gained. (Assuming my assumption about its purpose is correct).

That leaves us with the isIndirect flag adding confusion but no additional functionality (IMHO). One can express the same location in different ways with isIndirect, for example:

DBG_VALUE $reg, $noreg, !foo, !DIExpression(DW_OP_deref)
DBG_VALUE $reg, 0, !foo, !DIExpression()

and as shown with comment 8, sometimes the addition of isIndirect doesn't actually make a difference to the location.

Soooooo, I've taken a shot at getting rid of isIndirect -- first by never generating any indirect DBG_VALUEs (and instead appending DW_OP_deref to the relevant expressions), maybe later by eliminating the operand entirely. I haven't yet managed to generate an identical copy of clang-3.4, which I believe is because I don't fully understand what the DW_OP_deref_size work in PrologEpilogInserter.cpp does. All the test failures are minor changes in representation and bug 41992, nothing serious.

I imagine I'll have to mail llvm-dev at some point to check that there isn't some part of the design I've missed; this is more to record how I currently view this bug.

[0]

// If the location was spilled, the new DBG_VALUE will be indirect. If the

@dwblaikie
Copy link
Collaborator

NB: after some prodding, I'm of the opinion that the "isIndirect" flag of a
DBG_VALUE is technically redundant. As far as I'm aware, the purpose of the
flag is so that we can easily refer to the value of a stack slot when a
spill occurs. But now that dbg.addr intrinsics can generate DBG_VALUE insts
that are already indirect, we have to cope with multiple levels of
indirection [0], so little has been gained. (Assuming my assumption about
its purpose is correct).

That leaves us with the isIndirect flag adding confusion but no additional
functionality (IMHO). One can express the same location in different ways
with isIndirect, for example:

DBG_VALUE $reg, $noreg, !foo, !DIExpression(DW_OP_deref)
DBG_VALUE $reg, 0, !foo, !DIExpression()

and as shown with comment 8, sometimes the addition of isIndirect doesn't
actually make a difference to the location.

Soooooo, I've taken a shot at getting rid of isIndirect -- first by never
generating any indirect DBG_VALUEs (and instead appending DW_OP_deref to the
relevant expressions), maybe later by eliminating the operand entirely. I
haven't yet managed to generate an identical copy of clang-3.4, which I
believe is because I don't fully understand what the DW_OP_deref_size work
in PrologEpilogInserter.cpp does. All the test failures are minor changes in
representation and bug 41992, nothing serious.

I imagine I'll have to mail llvm-dev at some point to check that there isn't
some part of the design I've missed; this is more to record how I currently
view this bug.

[0]
https://github.com/llvm/llvm-project/blob/
b9f1e7b/llvm/lib/CodeGen/LiveDebugVariables.
cpp#L1302

Not sure when, exactly, isIndirect ended up on DBG_VALUE, maybe it was there before

NB: after some prodding, I'm of the opinion that the "isIndirect" flag of a
DBG_VALUE is technically redundant. As far as I'm aware, the purpose of the
flag is so that we can easily refer to the value of a stack slot when a
spill occurs. But now that dbg.addr intrinsics can generate DBG_VALUE insts
that are already indirect, we have to cope with multiple levels of
indirection [0], so little has been gained. (Assuming my assumption about
its purpose is correct).

That leaves us with the isIndirect flag adding confusion but no additional
functionality (IMHO). One can express the same location in different ways
with isIndirect, for example:

DBG_VALUE $reg, $noreg, !foo, !DIExpression(DW_OP_deref)
DBG_VALUE $reg, 0, !foo, !DIExpression()

and as shown with comment 8, sometimes the addition of isIndirect doesn't
actually make a difference to the location.

Soooooo, I've taken a shot at getting rid of isIndirect -- first by never
generating any indirect DBG_VALUEs (and instead appending DW_OP_deref to the
relevant expressions), maybe later by eliminating the operand entirely. I
haven't yet managed to generate an identical copy of clang-3.4, which I
believe is because I don't fully understand what the DW_OP_deref_size work
in PrologEpilogInserter.cpp does. All the test failures are minor changes in
representation and bug 41992, nothing serious.

I imagine I'll have to mail llvm-dev at some point to check that there isn't
some part of the design I've missed; this is more to record how I currently
view this bug.

[0]
https://github.com/llvm/llvm-project/blob/
b9f1e7b/llvm/lib/CodeGen/LiveDebugVariables.
cpp#L1302

I /think/ I'm probably to blame for isindirect - though it's a twisty path and it might be an unrelated (or at least different, but related) case of isindirect that you're talking about here.

I originally added an "isindirect" flag on DIVariable - back when we didn't have DIExpression, etc. (see r184365, r184367, and r184368)

This was to describe indirect pass by value parameters in C++. Initially (see #15135 for pointers) indirect pass by value parameters had had their type modified in clang so the DWARF would record them as reference parameters, not value parameters - but that wasn't the correct way to represent them (meant that calling the functions in the debugger did the wrong thing (not copying the parameter), among other problems).

So I fixed that bug in the frontend in r183329, but that messed up the location descriptions that I then fixed with the addition of the isindirect flag.

I think the isindirect flag then ended up being propagated into the early versions of the DIExpression work when that new support may not've been up to the task of describing the indirection in a more general way.

I think now DIExpression & co can probably describe this as just another OP_deref or whatever & there's probably no need for special handling.

@jmorse
Copy link
Member Author

jmorse commented Jun 17, 2019

Candidate patch for the too-much-indirection issue: https://reviews.llvm.org/D63429

David wrote:

I think the isindirect flag then ended up being propagated into the early
versions of the DIExpression work when that new support may not've been up
to the task of describing the indirection in a more general way.

Aha, thanks for the history, that does sound like the source of the isIndirect operand. It seems unlikely that it can be doing anything that can't be done with correct DIExpressions, but I'll mail llvm-dev anyway to avoid anyone getting unexpected surprises, then try to kill it off shortly.

@jmorse
Copy link
Member Author

jmorse commented Jul 1, 2019

Fixed in https://reviews.llvm.org/rL364736 ; killing off the isIndirect flag is a non-urgent improvement for the future IMO, I'll likely prod that after llvm-9 branches.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla llvm:codegen wrong-debug
Projects
None yet
Development

No branches or pull requests

4 participants