-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DebugInfo@O2] Bogus parameter value presented for struct argument #39535
Comments
By stepping, do you mean stepping to the first instruction in the function, or a normal line number step? With r350008, when doing a line number step into the function, meaning that we land after the prologue end, eyelids is printed with the right values for me:
and eyelids.b is printed incorrect at first instruction in the function. If the file is instead compiled with -O0, then a single location description is emitted, which says that the argument is located on the stack, meaning that we will print the parameter using garbage stack data at the first instruction of the function:
It seems to be me that we should not expect to be able to print parameters with the correct values (or as unavailable rather than with incorrect values) inside the prologue for optimized code, if we can't do that in non-optimized cases? Or am I overlooking something here? |
I removed the code that changes the start label for parameters (see remove.p), and ran check-llvm, which gave the following failures: Failing Tests (18): I have yet only looked at the tests that emit DWARF as output, as I'm not familiar with CodeView. I have attached the log as build.log. 1.) The debug values are inside the prologue, and setting the start at function begin is okay: In these cases the first non-overlapping fragments of the debug values
2.) The debug values are inside the prologue, and setting the start at function begin made us start the range before one or more definitions:
3.) The debug values are after the prologue, and setting the start at function begin made us start the range before one or more definitions:
3.) Tests I have not looked at due to them being CodeView tests: test/DebugInfo/COFF/fp-stack.ll 4.) Tests I have not looked into yet: test/DebugInfo/AArch64/asan-stack-vars.mir One have to take into consideration that these are modified IR/MIR tests, which may have unnatural debug value placements (especially MIR tests), but still, the tests I have looked at seems to indicate that moving the range in this way is more harmful than helpful. At least in the DWARF tests I have looked at, there is no case where moving the start of the range results in more post-epilogue instructions being covered by correct locations, without the addition of false positives. |
Hi David, [I guess I should stop commissioning new metasyntactic variable names, reading them back now is awkward,]
In gdb, I enter "start" to begin the process and land in the 'main' function, then enter 's' to step into the 'foo' function. This lands me on the first instruction of the function, rather than at the prologue end, as you say.
Hmmm. That reasoning makes complete sense -- I have a strong feeling of "the compiler could get this right therefore it should" that doesn't want to admit this though. To my mind, parameters have defined values at the transfer of control into the function, and thus should have locations. Fiddling with the test case though I've managed to reproduce an incorrect location after the prologue_end flag, by replacing the foo function with: --------8<-------- int Compiling that with the same command line (although I've moved on to r351956) gives the code: --------8<-------- The line table: --------8<-------- 0x0000000000400480 10 0 1 0 0 is_stmt And the location for 'eyelids': --------8<-------- The prologue is nonexistent and ends on the first instruction of the function. Following your example putting a breakpoint in gdb on 'foo' and running, gdb breaks on address 0x400480, where eax is still uninitialised, and printing 'eyelids' gives an incorrect reading. ~ Thanks for the comprehensive test analysis, it really does sound like that code segment is causing damage. I would imagine that with that patch applied, for any function that has a prologue, all parameters would have no location until the prologue ends, is that right? If so and many debug users expect to be able to step-into a function and see parameters immediately (as I typically do) this creates a conflict between debug completeness and debug accuracy :/. That then goes back to whether parameters should be readable in the prologue. Confusingly, if I compile the original code at the top of this bug with -O0, then when stepping into 'foo' from 'main', gdb stops at the end of the prologue. But when the code is compiled with -O2, gdb decides to stop on the first instruction of the function. lldb-6.0 replicates this behaviour. I'll try and dig into why this is, although I suspect "what gcc does" will be at the bottom of that hole. |
Oh, sorry! I was under the assumption that s/step would ignore instructions with line numbers in the prologue. The GDB function that seems to be used for when the command steps into functions (handle_step_into_function in infrun.c) has code that seems to be intended to skip to the end of the prologue (which uses the target hook gdbarch_skip_prologue), but I have only very quickly skimmed through the code, and not looked at it in a debugger, so maybe there is some caveat that causes it to not trigger here. I guess that this might invalidate some assumptions I made in my earlier comments.
At least for the targets I have looked at, the emitPrologue() implementations emit instructions directly at the start of the block, effectively moving down pre-existing DBG_VALUEs, so in general I don't think we'll have DBG_VALUEs at the first instruction of the function, at least. Maybe one could fix that by, before emitting the prologue, stashing away parameter-referencing DBG_VALUEs that are placed at the start of the block, and later re-insert them back at the start? I guess that there are some exceptions to consider there, e.g. for DBG_VALUE referring to stack positions, but maybe it could at least apply for register DBG_VALUEs as a start? |
Yeah, this is a new "feature" to me as well. It's not immediately clear why debuggers act this way, but it seems to be an established behaviour.
I reckon this will work -- the prologue insertion can't change the register argument locations (AFAIK) or the rest of the function would have to change too, so the register debug locations must stay the same as well. Some kind of target customisation facility might be in order in case they insert exotic instructions/facilities in prologues? Exactly what is guaranteed by LLVM prologue / epilogues is out of my area of expertise though. |
Yeah, I'm going to vaguely reiterate what others have said here - it seems unlikely the debugger /should/ be trying to render values before the prologue is finished. If locations needed to be accurate there, then no one could use simple location descriptions for any variable (since they'd all need to describe the register in the prologue, and the memory location later, etc, even/especially in -O0 builds) - it'd all need to be in location lists, which would be verbose/inefficient. So I think it'd be worth understanding what the rules and conventions are about prologue locations and stepping behavior. Why does GDB (& LLDB? & whoever else) sometimes step passed the prologue and sometimes not? What are the conditions that dictate that & is it reasonable to only emit location descriptions that are accurate in the prologue under those conditions? Should we push back and suggest debuggers are making a sub-optimal choice here & should instead always step over the prologue/never expect a prologue-accurate location description? |
I imagine that our debugger team at SIE would tell us that it's the job of the compiler to do this via the is_stmt flag. I wonder if that's what's causing this behaviour? I'm not sure how well debuggers generally respect is_stmt. Address Line Column File ISA Discriminator Flags 0x0000000000000000 8 0 1 0 0 is_stmt
... but I always enjoy reading your testcases as a result :) ... |
Actually years ago Eric & I fixed a pretty significant GDB failure due to is_stmt - basically when asked to break at a certain function, GCC would start at the instruction that function started on and try to find the nearest enclosing statement - it would do so by searching backwards in the line table until it found an instruction flagged with is_stmt, then break on all the instructions between that is_stmt and the next. This would result in GDB breaking at the end of the previous emitted function and the beginning of the intended function... Suffice to say this was behavior that was too bad to say "we'll keep emitting debug info this way and GDB can fix this", we changed Clang to emit is_stmt in the prologue. Maybe GDB's been fixed by now, but I don't know. (& not sure what other issues might result from emitting fewer is_stmt)
|
Interesting. Do you remember if there was a Bugzilla PR written for that, or some other resource one could look at? I tried skimming through commits in some related files in lib/CodeGen/AsmPrinter, but I was unable to find something that seemed related to that issue. |
Looking back through some history, r225010 is a more recent is_stmt change, to omit it from lines with the same line number (rather than is_stmt for every column/discriminator change, as well as every line number change) Ah, here it is - yeah, hard to find. #13675 has some related discussion and the change that fixed the GDB issue I was describing was r169304. |
From SROA we get the following output: *** IR Dump After SROA *** And after instruction selection we have: *** IR Dump After X86 DAG->DAG Instruction Selection ***:Machine code for function foo: IsSSA, TracksLivenessFunction Live Ins: $rdi in %0 bb.0.entry: Later on, we move up an instruction with line number information, ending the prologue before the DBG_VALUE for the second field in eyelids: *** IR Dump After Machine Instruction Scheduler ***:Machine code for function foo: NoPHIs, TracksLivenessFunction Live Ins: $rdi in %0 0B bb.0.entry: Later on, DebugHandlerBase::beginFunction() incorrectly moves the start of the ranges to the beginning of the function. I'm not sure who's "misbehaving" here, if we want to get a valid debug location for the parameter at the start of the function. I'm not too well-versed with the code in SelectionDAG, but I at least noted that EmitFuncArgumentDbgValue() do not trigger here, since it does not look past the truncs and shifts. This means that we don't get any DBG_VALUEs directly at the start of the function. However, I don't know if it would make sense to look past and handle such "wrapping" instructions there (by amending the debug expressions). We would ideally want to emit a simple location description that says that the variable is located in RDI at the start of the block. Maybe SROA could help SelectionDAG along by, in addition to emitting each fragmented dbg.value, emit a preceding dbg.value which describes the whole parameter using %eyelids.coerce? |
Sounds like we should push the discussion of is_stmt to later or, or possibly another ticket?, So that we're all on the same page, fixing this (latest reproducer) is something I believe we should be doing rather than marking the argument undef/optimised-out, because it doesn't amount to describing the argument inside of the prologue. This is more about describing that fragment away from its use. (A use that could alternately have been sunk further down the function). David Stenberg wrote:
Hmmmm. IMHO this is a symptom of debuginfo-in-the-instruction-stream pain: this information is much better recorded as a property of the Function. My feeling is that fixing in SelectionDAG would be a special case too far: when handing that dbg.value, as you say it would need to look past the truncs and shifts for many dbg.values to see whether there's an Argument at the bottom, which would probably be expensive. And ultimately it just hacks around the fact that this information about arguments can float anywhere around a Function. The SROA option is almost certainly better, although I'm almost entirely unfamiliar with SROA. We should (TM) be able to describe the argument fragment using only the Argument itself (because the type must be fixed), and if we can rely on the rest of LLVM leaving those dbg.values at the front of the entry block, not only will it fix this issue, but we can delete the beginFunction() code and simplify lots of SelectionDAG as well. Maybe even add this to the IR Verifier as well? |
As the last comment is over a year old: This still reproduced at head (with the updated example from #c5) in lldb with dwarf4 and dwarf5: (lldb) s
eyelids should be {a = 1, b = 2} With gdb, I am seeing the following when stepping into the function: (gdb) s In both cases, after the next step, eyelids is shown correctly as {a = 1, b = 2}. |
mentioned in issue llvm/llvm-bugzilla-archive#40638 |
Extended Description
Using r350008 and the command line "clang-8 -O2 -g test.cpp -o a.out -fno-inline", in the code below a bogus value is reported for the value of 'eyelids' when entering the 'foo' function. The code:
--------8<--------
struct bees {
int a;
int b;
};
int
foo(struct bees eyelids)
{
return eyelids.a + eyelids.b;
}
int
main()
{
struct bees xyzzy = { 1, 2 };
return foo(xyzzy);
}
-------->8--------
Gdbs interpretation on stepping into foo:
(gdb) frame
#0 foo (eyelids=...) at test.cpp:8
8 {
(gdb) print eyelids
$1 = {a = 1, b = 4195504}
Compiling with g++ 6.3 reports eyelids.b=2 at the same location. Examining clangs output with llvm-dwarfdump, we generate the following expression for "eyelids" for the body of "foo":
DW_OP_reg5 RDI, DW_OP_piece 0x4, DW_OP_reg0 RAX, DW_OP_piece 0x4
when the value of eyelids is actually only passed in rdi (upper and lower portions). Compiling with g++ and -gstrict-dwarf correctly gives only "DW_OP_reg5 RDI". The body of "foo" moves eyelids.b to eax initially, it appears that the location of that fragment gets hoisted to cover the beginning of the function, even when eax isn't def'd yet. I'm 95% confident the code that does that is here [0], which appears to assume the first DBG_VALUE of a parameter can have its location forwarded to the function start.
That assumption might have been valid given that there's code in SelectionDAG [1] that emits DBG_VALUEs for parameters at the start of functions, although for byvals only. Perhaps it used to produce entry DBG_VALUEs for every parameter in the past?
There's a test in-tree that already exhibits this behaviour, test/DebugInfo/ARM/partial-subreg.ll , which builds the 'self' parameter from stack into registers over several instructions, but the location covers the whole function, incorrectly. I ran into this problem when fiddling elsewhere, the reproducer is to generalise the problem.
[0] https://github.com/llvm-mirror/llvm/blob/6cd86b7cd2bb90a97d8a75196d6c0a4365c49a8b/lib/CodeGen/AsmPrinter/DebugHandlerBase.cpp#L219
[1] https://github.com/llvm-mirror/llvm/blob/27f17bfee31bec92b918f4ca6a6f7a2a37e4d00c/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp#L818
The text was updated successfully, but these errors were encountered: