Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DebugInfo@O2] Bogus parameter value presented for struct argument #39535

Open
jmorse opened this issue Dec 30, 2018 · 16 comments
Open

[DebugInfo@O2] Bogus parameter value presented for struct argument #39535

jmorse opened this issue Dec 30, 2018 · 16 comments
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party wrong-debug

Comments

@jmorse
Copy link
Member

jmorse commented Dec 30, 2018

Bugzilla Link 40188
Version trunk
OS Linux
Blocks #38116
CC @dstenb,@dwblaikie,@gregbedwell,@CarlosAlbertoEnciso,@pogo59,@tbosch

Extended Description

Using r350008 and the command line "clang-8 -O2 -g test.cpp -o a.out -fno-inline", in the code below a bogus value is reported for the value of 'eyelids' when entering the 'foo' function. The code:

--------8<--------
struct bees {
int a;
int b;
};

int
foo(struct bees eyelids)
{
return eyelids.a + eyelids.b;
}

int
main()
{
struct bees xyzzy = { 1, 2 };
return foo(xyzzy);
}
-------->8--------

Gdbs interpretation on stepping into foo:

(gdb) frame
#​0 foo (eyelids=...) at test.cpp:8
8 {
(gdb) print eyelids
$1 = {a = 1, b = 4195504}

Compiling with g++ 6.3 reports eyelids.b=2 at the same location. Examining clangs output with llvm-dwarfdump, we generate the following expression for "eyelids" for the body of "foo":

DW_OP_reg5 RDI, DW_OP_piece 0x4, DW_OP_reg0 RAX, DW_OP_piece 0x4

when the value of eyelids is actually only passed in rdi (upper and lower portions). Compiling with g++ and -gstrict-dwarf correctly gives only "DW_OP_reg5 RDI". The body of "foo" moves eyelids.b to eax initially, it appears that the location of that fragment gets hoisted to cover the beginning of the function, even when eax isn't def'd yet. I'm 95% confident the code that does that is here [0], which appears to assume the first DBG_VALUE of a parameter can have its location forwarded to the function start.

That assumption might have been valid given that there's code in SelectionDAG [1] that emits DBG_VALUEs for parameters at the start of functions, although for byvals only. Perhaps it used to produce entry DBG_VALUEs for every parameter in the past?

There's a test in-tree that already exhibits this behaviour, test/DebugInfo/ARM/partial-subreg.ll , which builds the 'self' parameter from stack into registers over several instructions, but the location covers the whole function, incorrectly. I ran into this problem when fiddling elsewhere, the reproducer is to generalise the problem.

[0] https://github.com/llvm-mirror/llvm/blob/6cd86b7cd2bb90a97d8a75196d6c0a4365c49a8b/lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp#L219
[1] https://github.com/llvm-mirror/llvm/blob/27f17bfee31bec92b918f4ca6a6f7a2a37e4d00c/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp#L818

@dstenb
Copy link
Mannequin

dstenb mannequin commented Jan 24, 2019

Gdbs interpretation on stepping into foo:

(gdb) frame
#​0 foo (eyelids=...) at test.cpp:8
8 {
(gdb) print eyelids
$1 = {a = 1, b = 4195504}

By stepping, do you mean stepping to the first instruction in the function, or a normal line number step?

With r350008, when doing a line number step into the function, meaning that we land after the prologue end, eyelids is printed with the right values for me:

(gdb) b *foo
Breakpoint 1 at 0x4004c0: file pr40188.c, line 8.
(gdb) b foo
Breakpoint 2 at 0x4004c7: file pr40188.c, line 9.
(gdb) run
Starting program: /home/edasten/upstream/master/a.out 

Breakpoint 1, foo (eyelids=...) at pr40188.c:8
8	{
(gdb) print eyelids
$1 = {a = 1, b = 4195536}
(gdb) c
Continuing.
Breakpoint 2, foo (eyelids=...) at pr40188.c:9
9	        return eyelids.a + eyelids.b;
(gdb) print eyelids
$2 = {a = 1, b = 2}

and eyelids.b is printed incorrect at first instruction in the function.

If the file is instead compiled with -O0, then a single location description is emitted, which says that the argument is located on the stack, meaning that we will print the parameter using garbage stack data at the first instruction of the function:

(gdb) b *foo
Breakpoint 1 at 0x4004c0: file pr40188.c, line 8.
(gdb) b foo
Breakpoint 2 at 0x4004c8: file pr40188.c, line 9.
(gdb) run
Starting program: /home/edasten/upstream/master/a.out 

Breakpoint 1, foo (eyelids=...) at pr40188.c:8
8	{
(gdb) print eyelids
$1 = {a = 0, b = 0}
(gdb) disas
Dump of assembler code for function foo:
=> 0x00000000004004c0 <+0>:	push   %rbp
   0x00000000004004c1 <+1>:	mov    %rsp,%rbp
   0x00000000004004c4 <+4>:	mov    %rdi,-0x8(%rbp)
   0x00000000004004c8 <+8>:	mov    -0x8(%rbp),%eax
   0x00000000004004cb <+11>:	add    -0x4(%rbp),%eax
   0x00000000004004ce <+14>:	pop    %rbp
   0x00000000004004cf <+15>:	retq   
End of assembler dump.

It seems to be me that we should not expect to be able to print parameters with the correct values (or as unavailable rather than with incorrect values) inside the prologue for optimized code, if we can't do that in non-optimized cases? Or am I overlooking something here?

@dstenb
Copy link
Mannequin

dstenb mannequin commented Jan 25, 2019

I removed the code that changes the start label for parameters (see remove.p), and ran check-llvm, which gave the following failures:

Failing Tests (18):
test/CodeGen/AMDGPU/llvm.dbg.value.ll
test/CodeGen/X86/2010-05-26-DotDebugLoc.ll
test/CodeGen/X86/debug-loclists.ll
test/DebugInfo/AArch64/asan-stack-vars.mir
test/DebugInfo/ARM/partial-subreg.ll
test/DebugInfo/COFF/fp-stack.ll
test/DebugInfo/COFF/fpo-stack-protect.ll
test/DebugInfo/COFF/pieces.ll
test/DebugInfo/COFF/register-variables.ll
test/DebugInfo/MIR/AArch64/clobber-sp.mir
test/DebugInfo/NVPTX/debug-info.ll
test/DebugInfo/X86/dbg-value-regmask-clobber.ll
test/DebugInfo/X86/debug-loc-asan.mir
test/DebugInfo/X86/debug-loc-offset.mir
test/DebugInfo/X86/dw_op_minus_direct.ll
test/DebugInfo/X86/loclists-dwp.ll
test/DebugInfo/X86/pieces-1.ll
test/DebugInfo/X86/pieces-3.ll

I have yet only looked at the tests that emit DWARF as output, as I'm not familiar with CodeView.

I have attached the log as build.log.

1.) The debug values are inside the prologue, and setting the start at function begin is okay:

In these cases the first non-overlapping fragments of the debug values
for the parameters refer to live-in registers, so it is okay to say
that they are valid from the beginning of the function.

  • test/DebugInfo/X86/pieces-1.ll

    Lfunc_begin0:
    .file 1 "pieces.c"
    .loc 1 3 0 ## pieces.c:3:0
    .cfi_startproc

    %bb.0: ## %entry

         pushq   %rbp
         .cfi_def_cfa_offset 16
         .cfi_offset %rbp, -16
         movq    %rsp, %rbp
         .cfi_def_cfa_register %rbp
    

    Ltmp0:
    ##DEBUG_VALUE: foo:s <- [DW_OP_LLVM_fragment 0 64] $rdi
    ##DEBUG_VALUE: foo:s <- [DW_OP_LLVM_fragment 64 32] $esi
    .loc 1 4 0 prologue_end ## pieces.c:4:0
    movl %esi, %eax

  • test/CodeGen/X86/2010-05-26-DotDebugLoc.ll

    Lfunc_begin0:
    .loc 1 17 0 ## foo.c:17:0
    .cfi_sections .debug_frame
    .cfi_startproc

    %bb.0: ## %entry

         pushq   %rbx
         .cfi_def_cfa_offset 16
         .cfi_offset %rbx, -16
    

    Ltmp0:
    ##DEBUG_VALUE: bar:myvar <- $rdi
    movq %rdi, %rbx
    Ltmp1:
    ##DEBUG_VALUE: bar:myvar <- $rbx
    .loc 1 18 0 prologue_end ## foo.c:18:0
    movl (%rdi), %edi

  • test/CodeGen/X86/debug-loclists.ll

    .Lfunc_begin0:
    .file 0 "/home/folder/test.cc" md5 0xe0f357ad6dcb791a774a0dae55baf5e7
    .file 1 "test.cc" md5 0xe0f357ad6dcb791a774a0dae55baf5e7
    .loc 1 6 0 # test.cc:6:0
    .cfi_startproc

    %bb.0: # %entry

         pushq   %rbx
         .cfi_def_cfa_offset 16
         .cfi_offset %rbx, -16
    

    .Ltmp0:
    #DEBUG_VALUE: baz:a <- [$rdi+0]
    movq %rdi, %rbx
    .Ltmp1:
    #DEBUG_VALUE: baz:a <- [$rbx+0]
    .loc 1 7 6 prologue_end # test.cc:7:6
    callq _ZN1A3fooEv

  • test/DebugInfo/X86/dbg-value-regmask-clobber.ll

    main: # @​main
    .Lfunc_begin0:
    .loc 1 4 0 # t.cpp:4:0
    .seh_proc main

    %bb.0: # %entry

         subq    $40, %rsp
         .seh_stackalloc 40
         .seh_endprologue
    

    .Ltmp0:
    #DEBUG_VALUE: main:argv <- $rdx
    #DEBUG_VALUE: main:argc <- $ecx
    .loc 1 5 5 prologue_end # t.cpp:5:5
    movl $1, x(%rip)

2.) The debug values are inside the prologue, and setting the start at function begin made us start the range before one or more definitions:

  • test/DebugInfo/X86/pieces-3.ll

    In this case, changing the start of the range for i1 moves it over the
    definition of $eax, which one fragment of the variable is described by.

    Lfunc_begin0:
    .file 1 "sroasplit-2.c"
    .loc 1 10 0 ## sroasplit-2.c:10:0
    .cfi_startproc

    %bb.0:

         pushq   %rbp
         .cfi_def_cfa_offset 16
         .cfi_offset %rbp, -16
         movq    %rsp, %rbp
         .cfi_def_cfa_register %rbp
    

    Ltmp0:
    ##DEBUG_VALUE: foo:outer <- [DW_OP_LLVM_fragment 0 64] $rdi
    ##DEBUG_VALUE: foo:outer <- [DW_OP_LLVM_fragment 96 32] $rsi
    movq %rsi, %rax
    Ltmp1:
    ##DEBUG_VALUE: foo:i1 <- [DW_OP_LLVM_fragment 32 32] undef
    ##DEBUG_VALUE: foo:i1 <- [DW_OP_LLVM_fragment 0 32] $eax
    ##DEBUG_VALUE: foo:outer <- [DW_OP_LLVM_fragment 96 32] $eax
    ## kill: def $eax killed $eax killed $rax
    .loc 1 12 0 prologue_end ## sroasplit-2.c:12:0
    popq %rbp

  • test/DebugInfo/X86/debug-loc-asan.mir

    .Lfunc_begin0:
    .file 1 "/llvm_cmake_gcc/test.cc"
    .loc 1 1 0 # test.cc:1:0
    .cfi_startproc
    [...]
    movq %rax, %rcx
    addq $32, %rcx
    [...]
    .Ltmp0:
    #DEBUG_VALUE: bar:y <- [DW_OP_deref] [$rcx+0]
    cmpb $0, %r8b
    [...]
    .loc 1 2 0 prologue_end # test.cc:2:0

3.) The debug values are after the prologue, and setting the start at function begin made us start the range before one or more definitions:

  • test/CodeGen/AMDGPU/llvm.dbg.value.ll

         .end_amd_kernel_code_t
    

    ; %bb.0: ; %entry
    .loc 1 1 42 prologue_end ; /tmp/test_debug_value.cl:1:42
    s_load_dwordx2 s[4:5], s[4:5], 0x0
    .Ltmp0:
    ;DEBUG_VALUE: test_debug_value:globalptr_arg <- $sgpr4_sgpr5

  • test/DebugInfo/ARM/partial-subreg.ll

    As previously mentioned by Jeremy, in this example we move the ranges for
    the debug values over their definitions.

    Lfunc_begin0:
    .file 1 "/simd.swift"
    .loc 1 0 0 @ simd.swift:0:0
    @ %bb.0: @ %entry
    .loc 1 5900 12 prologue_end @ simd.swift:5900:12
    cmp r0, #​0
    bne LBB0_2
    Ltmp0:
    @ %bb.1:
    .loc 1 0 12 is_stmt 0 @ simd.swift:0:12
    ldr r0, [sp]
    vmov d16, r1, r2
    vmov d17, r3, r0
    Ltmp1:
    @​DEBUG_VALUE: subscript.get:self <- [DW_OP_LLVM_fragment 0 96] $q8
    b LBB0_3
    Ltmp2:
    LBB0_2: @ %select.false
    add r0, sp, #​4
    vld1.32 {d16, d17}, [r0]
    Ltmp3:
    @​DEBUG_VALUE: subscript.get:self <- [DW_OP_LLVM_fragment 96 96] $q8
    LBB0_3: @ %select.end

  • test/DebugInfo/X86/dw_op_minus_direct.ll

    Starting the range at start of the function results in the variable
    being evaluated using an incorrect value of $eax at the first
    instruction after prologue end.

    This test case seems kinda artificial to me, as I would expect us to have a
    dbg.value for the parameter before the add, so maybe one should not draw too
    much of a conclusion from this.

    Lfunc_begin0:
    .file 1 "/tmp/minus.c"
    .loc 1 1 0 ## minus.c:1:0
    .cfi_sections .debug_frame
    .cfi_startproc

    %bb.0: ## %entry

                                         ## kill: def $edi killed $edi def $rdi
         .loc    1 2 11 prologue_end     ## minus.c:2:11
         leal    1(%rdi), %eax
    

    Ltmp0:
    ##DEBUG_VALUE: inc:i <- [DW_OP_constu 1, DW_OP_minus, DW_OP_stack_value] $eax
    .loc 1 2 3 is_stmt 0 ## minus.c:2:3

3.) Tests I have not looked at due to them being CodeView tests:

test/DebugInfo/COFF/fp-stack.ll
test/DebugInfo/COFF/fpo-stack-protect.ll
test/DebugInfo/COFF/pieces.ll
test/DebugInfo/COFF/register-variables.ll

4.) Tests I have not looked into yet:

test/DebugInfo/AArch64/asan-stack-vars.mir
test/DebugInfo/MIR/AArch64/clobber-sp.mir
test/DebugInfo/NVPTX/debug-info.ll
test/DebugInfo/X86/loclists-dwp.ll
test/DebugInfo/X86/debug-loc-offset.mir

One have to take into consideration that these are modified IR/MIR tests, which may have unnatural debug value placements (especially MIR tests), but still, the tests I have looked at seems to indicate that moving the range in this way is more harmful than helpful. At least in the DWARF tests I have looked at, there is no case where moving the start of the range results in more post-epilogue instructions being covered by correct locations, without the addition of false positives.

@dstenb
Copy link
Mannequin

dstenb mannequin commented Jan 25, 2019

@dstenb
Copy link
Mannequin

dstenb mannequin commented Jan 25, 2019

@jmorse
Copy link
Member Author

jmorse commented Jan 25, 2019

Hi David,

[I guess I should stop commissioning new metasyntactic variable names, reading them back now is awkward,]

By stepping, do you mean stepping to the first instruction in the
function, or a normal line number step?

In gdb, I enter "start" to begin the process and land in the 'main' function, then enter 's' to step into the 'foo' function. This lands me on the first instruction of the function, rather than at the prologue end, as you say.

It seems to be me that we should not expect to be able to print parameters
with the correct values (or as unavailable rather than with incorrect
values) inside the prologue for optimized code, if we can't do that in non-
optimized cases? Or am I overlooking something here?

Hmmm. That reasoning makes complete sense -- I have a strong feeling of "the compiler could get this right therefore it should" that doesn't want to admit this though. To my mind, parameters have defined values at the transfer of control into the function, and thus should have locations.

Fiddling with the test case though I've managed to reproduce an incorrect location after the prologue_end flag, by replacing the foo function with:

--------8<--------
int global;

int
foo(struct bees eyelids)
{
global = eyelids.a;
return eyelids.b;
}
-------->8--------

Compiling that with the same command line (although I've moved on to r351956) gives the code:

--------8<--------
Dump of assembler code for function foo:
=> 0x0000000000400480 <+0>: mov %edi,0x200ba6(%rip) # 0x60102c
0x0000000000400486 <+6>: mov %rdi,%rax
0x0000000000400489 <+9>: shr $0x20,%rax
0x000000000040048d <+13>: retq
-------->8--------

The line table:

--------8<--------
Address Line Column File ISA Discriminator Flags


0x0000000000400480 10 0 1 0 0 is_stmt
0x0000000000400480 11 16 1 0 0 is_stmt prologue_end
0x000000000040048d 12 9 1 0 0 is_stmt
-------->8--------

And the location for 'eyelids':

--------8<--------
DW_AT_location (0x00000000
[0x0000000000400480, 0x000000000040048d): DW_OP_reg5 RDI, DW_OP_piece 0x4, DW_OP_reg0 RAX, DW_OP_piece 0x4)
DW_AT_name ("eyelids")
-------->8--------

The prologue is nonexistent and ends on the first instruction of the function. Following your example putting a breakpoint in gdb on 'foo' and running, gdb breaks on address 0x400480, where eax is still uninitialised, and printing 'eyelids' gives an incorrect reading.

~

Thanks for the comprehensive test analysis, it really does sound like that code segment is causing damage. I would imagine that with that patch applied, for any function that has a prologue, all parameters would have no location until the prologue ends, is that right? If so and many debug users expect to be able to step-into a function and see parameters immediately (as I typically do) this creates a conflict between debug completeness and debug accuracy :/.

That then goes back to whether parameters should be readable in the prologue. Confusingly, if I compile the original code at the top of this bug with -O0, then when stepping into 'foo' from 'main', gdb stops at the end of the prologue. But when the code is compiled with -O2, gdb decides to stop on the first instruction of the function. lldb-6.0 replicates this behaviour. I'll try and dig into why this is, although I suspect "what gcc does" will be at the bottom of that hole.

@dstenb
Copy link
Mannequin

dstenb mannequin commented Jan 25, 2019

In gdb, I enter "start" to begin the process and land in the 'main'
function, then enter 's' to step into the 'foo' function. This lands me on
the first instruction of the function, rather than at the prologue end, as
you say.

Oh, sorry! I was under the assumption that s/step would ignore instructions with line numbers in the prologue.

The GDB function that seems to be used for when the command steps into functions (handle_step_into_function in infrun.c) has code that seems to be intended to skip to the end of the prologue (which uses the target hook gdbarch_skip_prologue), but I have only very quickly skimmed through the code, and not looked at it in a debugger, so maybe there is some caveat that causes it to not trigger here.

I guess that this might invalidate some assumptions I made in my earlier comments.

Thanks for the comprehensive test analysis, it really does sound like that
code segment is causing damage. I would imagine that with that patch
applied, for any function that has a prologue, all parameters would have no
location until the prologue ends, is that right?

At least for the targets I have looked at, the emitPrologue() implementations emit instructions directly at the start of the block, effectively moving down pre-existing DBG_VALUEs, so in general I don't think we'll have DBG_VALUEs at the first instruction of the function, at least.

Maybe one could fix that by, before emitting the prologue, stashing away parameter-referencing DBG_VALUEs that are placed at the start of the block, and later re-insert them back at the start? I guess that there are some exceptions to consider there, e.g. for DBG_VALUE referring to stack positions, but maybe it could at least apply for register DBG_VALUEs as a start?

@jmorse
Copy link
Member Author

jmorse commented Jan 25, 2019

Oh, sorry! I was under the assumption that s/step would ignore instructions
with line numbers in the prologue.

Yeah, this is a new "feature" to me as well. It's not immediately clear why debuggers act this way, but it seems to be an established behaviour.

Maybe one could fix that by, before emitting the prologue, stashing away
parameter-referencing DBG_VALUEs that are placed at the start of the block,
and later re-insert them back at the start? I guess that there are some
exceptions to consider there, e.g. for DBG_VALUE referring to stack
positions, but maybe it could at least apply for register DBG_VALUEs as a
start?

I reckon this will work -- the prologue insertion can't change the register argument locations (AFAIK) or the rest of the function would have to change too, so the register debug locations must stay the same as well. Some kind of target customisation facility might be in order in case they insert exotic instructions/facilities in prologues? Exactly what is guaranteed by LLVM prologue / epilogues is out of my area of expertise though.

@dwblaikie
Copy link
Collaborator

Yeah, I'm going to vaguely reiterate what others have said here - it seems unlikely the debugger /should/ be trying to render values before the prologue is finished. If locations needed to be accurate there, then no one could use simple location descriptions for any variable (since they'd all need to describe the register in the prologue, and the memory location later, etc, even/especially in -O0 builds) - it'd all need to be in location lists, which would be verbose/inefficient.

So I think it'd be worth understanding what the rules and conventions are about prologue locations and stepping behavior. Why does GDB (& LLDB? & whoever else) sometimes step passed the prologue and sometimes not? What are the conditions that dictate that & is it reasonable to only emit location descriptions that are accurate in the prologue under those conditions? Should we push back and suggest debuggers are making a sub-optimal choice here & should instead always step over the prologue/never expect a prologue-accurate location description?

@gregbedwell
Copy link
Collaborator

In gdb, I enter "start" to begin the process and land in the 'main'
function, then enter 's' to step into the 'foo' function. This lands me on
the first instruction of the function, rather than at the prologue end, as
you say.

Oh, sorry! I was under the assumption that s/step would ignore instructions
with line numbers in the prologue.

I imagine that our debugger team at SIE would tell us that it's the job of the compiler to do this via the is_stmt flag. I wonder if that's what's causing this behaviour? I'm not sure how well debuggers generally respect is_stmt.

Address Line Column File ISA Discriminator Flags


0x0000000000000000 8 0 1 0 0 is_stmt
0x0000000000000007 9 26 1 0 0 is_stmt prologue_end
0x000000000000000a 9 9 1 0 0
0x000000000000000b 9 9 1 0 0 end_sequence

[I guess I should stop commissioning new metasyntactic variable names,
reading them back now is awkward,]

... but I always enjoy reading your testcases as a result :) ...

@dwblaikie
Copy link
Collaborator

In gdb, I enter "start" to begin the process and land in the 'main'
function, then enter 's' to step into the 'foo' function. This lands me on
the first instruction of the function, rather than at the prologue end, as
you say.

Oh, sorry! I was under the assumption that s/step would ignore instructions
with line numbers in the prologue.

I imagine that our debugger team at SIE would tell us that it's the job of
the compiler to do this via the is_stmt flag. I wonder if that's what's
causing this behaviour? I'm not sure how well debuggers generally respect
is_stmt.

Address Line Column File ISA Discriminator Flags


0x0000000000000000 8 0 1 0 0 is_stmt
0x0000000000000007 9 26 1 0 0 is_stmt
prologue_end
0x000000000000000a 9 9 1 0 0
0x000000000000000b 9 9 1 0 0 end_sequence

Actually years ago Eric & I fixed a pretty significant GDB failure due to is_stmt - basically when asked to break at a certain function, GCC would start at the instruction that function started on and try to find the nearest enclosing statement - it would do so by searching backwards in the line table until it found an instruction flagged with is_stmt, then break on all the instructions between that is_stmt and the next. This would result in GDB breaking at the end of the previous emitted function and the beginning of the intended function...

Suffice to say this was behavior that was too bad to say "we'll keep emitting debug info this way and GDB can fix this", we changed Clang to emit is_stmt in the prologue.

Maybe GDB's been fixed by now, but I don't know. (& not sure what other issues might result from emitting fewer is_stmt)

[I guess I should stop commissioning new metasyntactic variable names,
reading them back now is awkward,]

... but I always enjoy reading your testcases as a result :) ...

@dstenb
Copy link
Mannequin

dstenb mannequin commented Jan 29, 2019

Actually years ago Eric & I fixed a pretty significant GDB failure due to
is_stmt - basically when asked to break at a certain function, GCC would
start at the instruction that function started on and try to find the
nearest enclosing statement - it would do so by searching backwards in the
line table until it found an instruction flagged with is_stmt, then break on
all the instructions between that is_stmt and the next. This would result in
GDB breaking at the end of the previous emitted function and the beginning
of the intended function...

Suffice to say this was behavior that was too bad to say "we'll keep
emitting debug info this way and GDB can fix this", we changed Clang to emit
is_stmt in the prologue.

Maybe GDB's been fixed by now, but I don't know. (& not sure what other
issues might result from emitting fewer is_stmt)

Interesting. Do you remember if there was a Bugzilla PR written for that, or some other resource one could look at? I tried skimming through commits in some related files in lib/CodeGen/AsmPrinter, but I was unable to find something that seemed related to that issue.

@dwblaikie
Copy link
Collaborator

Looking back through some history, r225010 is a more recent is_stmt change, to omit it from lines with the same line number (rather than is_stmt for every column/discriminator change, as well as every line number change)

Ah, here it is - yeah, hard to find. #13675 has some related discussion and the change that fixed the GDB issue I was describing was r169304.

@dstenb
Copy link
Mannequin

dstenb mannequin commented Jan 30, 2019

Fiddling with the test case though I've managed to reproduce an incorrect
location after the prologue_end flag, by replacing the foo function with:

--------8<--------
int global;

int
foo(struct bees eyelids)
{
global = eyelids.a;
return eyelids.b;
}
-------->8--------

Compiling that with the same command line (although I've moved on to
r351956) gives the code:

--------8<--------
Dump of assembler code for function foo:
=> 0x0000000000400480 <+0>: mov %edi,0x200ba6(%rip) # 0x60102c

0x0000000000400486 <+6>: mov %rdi,%rax
0x0000000000400489 <+9>: shr $0x20,%rax
0x000000000040048d <+13>: retq
-------->8--------

The line table:

--------8<--------
Address Line Column File ISA Discriminator Flags


0x0000000000400480 10 0 1 0 0 is_stmt
0x0000000000400480 11 16 1 0 0 is_stmt
prologue_end
0x000000000040048d 12 9 1 0 0 is_stmt
-------->8--------

And the location for 'eyelids':

--------8<--------
DW_AT_location (0x00000000
[0x0000000000400480, 0x000000000040048d): DW_OP_reg5 RDI, DW_OP_piece
0x4, DW_OP_reg0 RAX, DW_OP_piece 0x4)
DW_AT_name ("eyelids")
-------->8--------

The prologue is nonexistent and ends on the first instruction of the
function. Following your example putting a breakpoint in gdb on 'foo' and
running, gdb breaks on address 0x400480, where eax is still uninitialised,
and printing 'eyelids' gives an incorrect reading.

From SROA we get the following output:

*** IR Dump After SROA ***
; Function Attrs: noinline nounwind uwtable
define dso_local i32 @​foo(i64 %eyelids.coerce) #​0 !dbg !​11 {
entry:
%eyelids.sroa.0.0.extract.trunc = trunc i64 %eyelids.coerce to i32
call void @​llvm.dbg.value(metadata i32 %eyelids.sroa.0.0.extract.trunc, metadata !​19, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32)), !dbg !​20
%eyelids.sroa.2.0.extract.shift = lshr i64 %eyelids.coerce, 32
%eyelids.sroa.2.0.extract.trunc = trunc i64 %eyelids.sroa.2.0.extract.shift to i32
call void @​llvm.dbg.value(metadata i32 %eyelids.sroa.2.0.extract.trunc, metadata !​19, metadata !DIExpression(DW_OP_LLVM_fragment, 32, 32)), !dbg !​20
store i32 %eyelids.sroa.0.0.extract.trunc, i32* @​global, align 4, !dbg !​21, !tbaa !​22
ret i32 %eyelids.sroa.2.0.extract.trunc, !dbg !​26
}

And after instruction selection we have:

*** IR Dump After X86 DAG->DAG Instruction Selection ***:

Machine code for function foo: IsSSA, TracksLiveness

Function Live Ins: $rdi in %0

bb.0.entry:
liveins: $rdi
%0:gr64 = COPY $rdi
%1:gr32 = COPY %0.sub_32bit:gr64
DBG_VALUE %1:gr32, $noreg, !"eyelids", !DIExpression(DW_OP_LLVM_fragment, 0, 32), debug-location !​20; pr40188.c:9:17 line no:9
%2:gr64 = SHR64ri %0:gr64(tied-def 0), 32, implicit-def dead $eflags
%3:gr32 = COPY %2.sub_32bit:gr64
DBG_VALUE %3:gr32, $noreg, !"eyelids", !DIExpression(DW_OP_LLVM_fragment, 32, 32), debug-location !​20; pr40188.c:9:17 line no:9
MOV32mr $rip, 1, $noreg, @​global, $noreg, killed %1:gr32, debug-location !​21 :: (store 4 into @​global, !tbaa !​22); pr40188.c:11:16
$eax = COPY %3:gr32, debug-location !​26; pr40188.c:12:9
RET 0, $eax, debug-location !​26; pr40188.c:12:9

Later on, we move up an instruction with line number information, ending the prologue before the DBG_VALUE for the second field in eyelids:

*** IR Dump After Machine Instruction Scheduler ***:

Machine code for function foo: NoPHIs, TracksLiveness

Function Live Ins: $rdi in %0

0B bb.0.entry:
liveins: $rdi
16B %0:gr64_with_sub_8bit = COPY $rdi
DBG_VALUE %0.sub_32bit:gr64_with_sub_8bit, $noreg, !"eyelids", !DIExpression(DW_OP_LLVM_fragment, 0, 32), debug-location !​20; pr40188.c:9:17 line no:9
96B MOV32mr $rip, 1, $noreg, @​global, $noreg, %0.sub_32bit:gr64_with_sub_8bit, debug-location !​21 :: (store 4 into @​global, !tbaa !​22); pr40188.c:11:16
100B %2:gr64_with_sub_8bit = COPY %0:gr64_with_sub_8bit
104B %2:gr64_with_sub_8bit = SHR64ri %2:gr64_with_sub_8bit(tied-def 0), 32, implicit-def dead $eflags
DBG_VALUE %2.sub_32bit:gr64_with_sub_8bit, $noreg, !"eyelids", !DIExpression(DW_OP_LLVM_fragment, 32, 32), debug-location !​20; pr40188.c:9:17 line no:9
112B $eax = COPY %2.sub_32bit:gr64_with_sub_8bit, debug-location !​26; pr40188.c:12:9
128B RET 0, killed $eax, debug-location !​26; pr40188.c:12:9

Later on, DebugHandlerBase::beginFunction() incorrectly moves the start of the ranges to the beginning of the function.

I'm not sure who's "misbehaving" here, if we want to get a valid debug location for the parameter at the start of the function. I'm not too well-versed with the code in SelectionDAG, but I at least noted that EmitFuncArgumentDbgValue() do not trigger here, since it does not look past the truncs and shifts. This means that we don't get any DBG_VALUEs directly at the start of the function. However, I don't know if it would make sense to look past and handle such "wrapping" instructions there (by amending the debug expressions). We would ideally want to emit a simple location description that says that the variable is located in RDI at the start of the block. Maybe SROA could help SelectionDAG along by, in addition to emitting each fragmented dbg.value, emit a preceding dbg.value which describes the whole parameter using %eyelids.coerce?

@jmorse
Copy link
Member Author

jmorse commented Jan 31, 2019

Sounds like we should push the discussion of is_stmt to later or, or possibly another ticket?,

So that we're all on the same page, fixing this (latest reproducer) is something I believe we should be doing rather than marking the argument undef/optimised-out, because it doesn't amount to describing the argument inside of the prologue. This is more about describing that fragment away from its use. (A use that could alternately have been sunk further down the function).

David Stenberg wrote:

I'm not sure who's "misbehaving" here, if we want to get a valid debug
location for the parameter at the start of the function. I'm not too well-
versed with the code in SelectionDAG, but I at least noted that
EmitFuncArgumentDbgValue() do not trigger here, since it does not look
past the truncs and shifts. This means that we don't get any DBG_VALUEs
directly at the start of the function. However, I don't know if it would
make sense to look past and handle such "wrapping" instructions there (by
amending the debug expressions). We would ideally want to emit a simple
location description that says that the variable is located in RDI at the
start of the block. Maybe SROA could help SelectionDAG along by, in
addition to emitting each fragmented dbg.value, emit a preceding dbg.value
which describes the whole parameter using %eyelids.coerce?

Hmmmm. IMHO this is a symptom of debuginfo-in-the-instruction-stream pain: this information is much better recorded as a property of the Function.

My feeling is that fixing in SelectionDAG would be a special case too far: when handing that dbg.value, as you say it would need to look past the truncs and shifts for many dbg.values to see whether there's an Argument at the bottom, which would probably be expensive. And ultimately it just hacks around the fact that this information about arguments can float anywhere around a Function.

The SROA option is almost certainly better, although I'm almost entirely unfamiliar with SROA. We should (TM) be able to describe the argument fragment using only the Argument itself (because the type must be fixed), and if we can rely on the rest of LLVM leaving those dbg.values at the front of the entry block, not only will it fix this issue, but we can delete the beginFunction() code and simplify lots of SelectionDAG as well. Maybe even add this to the IR Verifier as well?

@tbosch
Copy link
Contributor

tbosch commented Aug 3, 2020

As the last comment is over a year old: This still reproduced at head (with the updated example from #c5) in lldb with dwarf4 and dwarf5:

(lldb) s
Process 2217599 stopped

  • thread #​1, name = 'a.out', stop reason = step in
    frame #​0: 0x0000000000401110 a.out`foo(eyelids=(a = 1, b = 0)) at test2.cc:11:16
    8 int
    9 foo(struct bees eyelids)
    10 {
    -> 11 global = eyelids.a;
    12 return eyelids.b;
    13 }

eyelids should be {a = 1, b = 2}

With gdb, I am seeing the following when stepping into the function:

(gdb) s
foo (eyelids=...) at test2.cc:11
11 global = eyelids.a;
(gdb) print eyelids
$1 = {a = 1, b = }

In both cases, after the next step, eyelids is shown correctly as {a = 1, b = 2}.

@dstenb
Copy link
Mannequin

dstenb mannequin commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#40638

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@llvmbot llvmbot added the confirmed Verified by a second party label Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla confirmed Verified by a second party wrong-debug
Projects
None yet
Development

No branches or pull requests

5 participants