Skip to content
Clark Gaebel edited this page Apr 16, 2022 · 25 revisions

Why do my stack frames stairstep downwards and just generally look very broken?

magic-trace reconstructs stack frames from a history of your program's control flow. That job is straightforward if your program creates and closes stack frames with call and ret instructions. The actual constraint here is something along the lines of: all the transfer of control flow between functions must be limited to calls, rets, and tail-position jmps. There's an impedance mismatch between that rule and constructs like C's longjmp or C++ exceptions.

We've done some work to make some popular exotic situations behave well, like most exceptions in OCaml. But, in general, magic-trace has limited support for custom control flow. If you do something fancy in a trace, your stack frames will look a little wonky. There's no general answer for this, we'll need to add some explicit code in magic-trace for every language and every runtime's custom control flow. We appreciate any incremental progress towards that goal from the community.

For what it's worth, the bottom-most stack frame does not have this problem. That's always trustworthy regardless of the control flow you used to get there.

You might be thinking: "perf doesn't have this problem, why does magic-trace?". The reason is that perf samples your program at regular intervals and walks the stack. magic-trace doesn't have that luxury. It doesn't actually walk the stack, it merely gets a copy of your program's control flow and reconstructs stack frames from that. perf in LBR mode has the same caveats.

What do stack frames that say "[unknown]" mean?

magic-trace was unable to determine the function name. The root cause is usually that your program is missing debug symbols. Some common reasons that can happen are:

  • Your program was compiled without debug symbols.
  • You're trying to trace into a globally installed library which doesn't have debug symbols. You may need to install a -dbg variant of the library from your package manager.
  • There's bug in magic-trace. File an issue if you think so!

What do stack frames that say "[untraced]" mean?

That's probably where you entered into the kernel. If you want to peek behind that curtain, enable kernel tracing by passing -trace-include-kernel to magic-trace.

If you've enabled kernel tracing, the next most likely reason is that your process context switched out. You should be able to tell, because the nearest stack frames will mention scheduling-related terms in their names.

There's other, less likely reasons this can happen, too. We don't know all of them, but they're for relatively niche reasons like attempting to trace into a secure enclave.

What do arrows that say "[decode error: Overflow packet]" mean?

Intel PT generated an "OVF packet". The Intel Software Developer's Manual says:

325462-sdm-vol-1-2abcd-3abcd.pdf

It's a little vague, but based on our reading of this, it happens when the application + Intel PT use more memory bandwidth than what's available. When that happens, the application takes priority and Intel PT drops packets. People have also noticed drops around C-state transitions, but I haven't been able to find any documentation from Intel to corroborate that.

When magic-trace sees an overflow, it clears all stack frames and the generated trace may look discontinuous around that point. Please file an issue if this looks severely broken. We do try hard to behave right in this scenario and it's something we've screwed up before.

What do the little arrows in my traces mean?

They're events that took "zero" time. Of course, nothing takes zero time, this is an artifact of how Intel PT works. Intel PT only provides timing updates every few (5-ish) events. So if you trace a function call and return before any timing updates are sent, it looks to magic-trace like the event took no time at all.

Why do I see trace data following a stop symbol?

We send a signal to a perf process, scheduling, etc. We could filter it out, but figured that most users would rather have the extra data than not have it. There's a search bar at the top of Perfetto if you're having trouble finding your stop symbol.

Why does the snapshot symbol marker not perfectly line up with when my stop symbol was called?

The time is sampled in two different locations. The stack timeline view is reconstructed from timestamps produced by Intel PT, while the snapshot markers come from the timestamp associated with the breakpoint hit event magic-trace receives from perf. In our testing, the skid between these two appears to be ~3-4us.

Why is my snapshot symbol not triggering?

Maybe that function was inlined? Tell your compiler not to inline it. If that still doesn't work, try to create a minimal reproducible example of the problem and file an issue.

Why did you open source this?

�Two reasons:

  1. We want to work with people that care about software and its performance. If you're reading this, you should at least consider applying to work at Jane Street. We are producers and consumers of tools like this, and we'd love for you to join us.

  2. We need help from the community to really make magic-trace shine. Intel PT is a relatively niche feature; it suffers from rough edges, lack of documentation, and general disuse by the software development community. We hope that if more people see how useful it is, more people will work to improve it.