Add un-inline support using addr2line #14

jasonrhansen · 2019-02-01T03:38:43Z

This commit adds support for #5, and adds command line flags --inline and --context.

The problem is finding a good way to test it. The perl version has no tests for it. They have a comment that says "these are tricky since they use addr2line, whose output will vary based on the test
system's binaries and symbol tables." I agree that this makes it difficult. Any ideas on how to approach this would be helpful.

codecov-io · 2019-02-01T04:43:03Z

Codecov Report

Merging #14 into master will increase coverage by 4.37%.
The diff coverage is 71.73%.

@@            Coverage Diff            @@
##           master     #14      +/-   ##
=========================================
+ Coverage   40.22%   44.6%   +4.37%     
=========================================
  Files           7       7              
  Lines         532     630      +98     
=========================================
+ Hits          214     281      +67     
- Misses        318     349      +31

Impacted Files	Coverage Δ
src/bin/collapse-perf.rs	`0% <0%> (ø)`	⬆️
src/collapse/perf.rs	`82.57% <69.23%> (-8.18%)`	⬇️
tests/collapse-perf-tests.rs	`93.61% <94.73%> (+5.92%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 048990b...bce7ee0. Read the comment docs.

src/collapse/perf.rs

jonhoo · 2019-02-01T05:41:01Z

This is a cool change, and pretty much how I expected it to turn out. Left a bunch of inline comments. Should also be far faster than stackcollapse-perf's support for inlining, since it doesn't have to shell out for every $pc!

As for testing, I sadly don't have a good answer for that either. One way would be for us to include a binary in the repo that is a compiled C program that we also have profiles for. We'd just want to make sure the compiler does inline stuff in it, but then it should make for a decent, and consistent, test case. What do you think?

jonhoo · 2019-02-01T05:42:57Z

Also, seeing all that inlining code really makes me wish that addr2line had the code from the addr2line example program available as a function directly on the library (cc @FitZen @philipc @main--).

src/collapse/perf.rs

main-- · 2019-02-01T12:58:39Z

Also, seeing all that inlining code really makes me wish that addr2line had the code from the addr2line example program available as a function directly on the library

I'm not sure what you mean? From what I can see the example does only three notable things: mmap, string parsing and string printing. The mmap is is 2 lines of code plus a crate while the other two exist basically only to closely mimic binutils addr2line. It's unclear to me how this would help here.

jonhoo · 2019-02-01T15:05:06Z

@main-- oh, sorry, I should have been a bit more specific. The un_inline function introduced in this PR has to:

open the file
pass it to object::File::parse
pass that to Context::new
call Context::find_frames
iterate over the frames (with an extra unwrap it seems)

tease out the function name from each frame using (somewhat simplified)

addr2line::demangle_auto(frame.function.map(|func| func.raw_name()), func.language))

To me at least, that's very similar to the flow of examples/addr2line.rs, and it seems unfortunate to have to repeat the whole boilerplate if this is a common use-case for addr2line (looking up function stack by address). In particular, I wonder whether addr2line could provide a convenience API for "create a new Context from a file" (maybe using mmap and from_sections internally if possible), and "iterate over the functions of an address" (including demangling the raw_name). Something like Context::from_file(&File) -> Context and Context::resolve_iter(u64) -> impl Iterator<Item = Function>, where Function is a struct that holds the demangled name, file, line number, etc.

jonhoo · 2019-02-01T18:12:54Z

Oh, @jasonrhansen, just making sure you saw my comment about testing.

jonhoo · 2019-02-01T21:16:36Z

Also, to be clear, it's fine if you make multiple commits, and then we just squash them all at the end. Instead of force-pushing all the time :)

jasonrhansen · 2019-02-01T21:32:34Z

@jonhoo, I did see your comment about testing and I agree with the general approach. Are you suggesting we write our own C program to use as a test case? Do you have any ideas of something fairly simple we could use?

Also, sorry about the force-pushing. I'm about to push a large commit with all my changes. I probably should have broken it up into several commits.

jonhoo · 2019-02-01T21:51:35Z

Yeah, I was thinking we'd write our own simple C program. Or a Rust program would probably also be totally fine. Something with no external dependencies like:

#[inline(always)]
fn count_to(mut n: usize) {
    while n != 0 {
        n -= 1'
    }
}

#[inline(always)]
fn double_inline() {
    count_to(1_000_000);
}

#[inline(never)]
fn not_inlined() {
    count_to(1_000_000);
}

fn main() {
    count_to(1_000_000);
    double_inline();
    not_inlined();
}

We'd do a perf record --call-graph dwarf of running that program, run perf script, and store the output of that somewhere in the repository (adjust the constants if the file is too large). Then for the test we'd compare stackcollapse-perf with us with and without --inline and --context.

jonhoo · 2019-02-01T21:52:06Z

As for force-pushing, don't worry about it. Going forward though, individual commits are easier to review, and then we can just squash it all at the end if we want!

main-- · 2019-02-02T13:30:06Z

@jonhoo Oh, I see. What you're suggesting makes sense, but the reason I originally structured the API this way is to avoid pulling in heavy external dependencies that are not strictly needed. For instance memmap is obviously just one of many ways to obtain code. Then there is object which is a very convenient way to deal with binaries across platforms but by no means a hard dependency of addr2line - here Context::new already is the convenience function in the sense that it fetches and constructs all the gimli sections for you. Constructing the entire Object in addr2line however is the point where the tradeoff between convenience vs flexibility is not worth it any more imo.

Bringing a file into memory and then calling object::File::parse on it may be a common usecase but I feel like this is best served by an example to copy from rather than an actual black-box function.

The frame iteration is basically a bog-standard for loop - except it has to be a FallibleIterator because the library lazily parses the DWARF whenever necessary and this might fail (e.g. malformed DWARF). This is impossible to abstract over without sacrificing the laziness and eagerly parsing and buffering all the frames before returning initially.

The demangling is only implemented this way in the addr2line example because you can turn it off. In your example you may just use FunctionName::demangle.

The crate already does a lot in terms of abstracting a complex task into simple parts. I agree that it could use some ergonomic improvements but consider that all of these papercuts at least have a reason. For instance the Frame struct is divided between function and location because those are completely independent in DWARF - one may exist while the other does not.

src/collapse/perf.rs

jonhoo · 2019-02-02T16:43:09Z

We've moved the discussion with @main-- to gimli-rs/addr2line#110 (comment) so we can keep this PR to the point.

jasonrhansen · 2019-02-03T23:42:35Z

I finally started trying to test this, but I'm not sure how to proceed. I wrote the following simple program in C.

inline void __attribute__((always_inline)) count_to(unsigned int n) {
    for(int i = 0; i < n; i++);
}

inline void __attribute__((always_inline)) double_inline() {
    count_to(1000000000);
}

static void __attribute__ ((noinline)) not_inlined() {
    count_to(1000000000);
}

int main() {
    count_to(1000000000);
    double_inline();
    not_inlined();
}

Then did the following:

gcc -g inline_test.c
perf record --call-graph dwarf ./a.out
perf script > perfoutput

The problem I'm having is the --inline option doesn't seem to work with stackcollapse-perf.pl, or our implementation.

stackcollapse-perf.pl --inline perfoutput:

a.out;??;??;?? 3767
a.out;??;??;??;?? 1930
a.out;??;??;??;??;?? 1

cargo run --bin inferno-collapse-perf -- perfoutput --inline:

a.out 5698

So then I decided to try addr2line directly and took the address from this line 55ca69b98131 not_inlined+0x18 (/home/jrhansen/projects/rust/inferno/tests/a.out) in perfoutput to see if addr2line could tell me the file and function name.
I ran addr2line -a -e ./a.out -i -f -s -C 55ca69b98131
But this was the output:

0x000055ca69b98131
??
??:0

Next I ran objdump -d a.out. Here's the the section of the output for not_inlined:

0000000000001119 <not_inlined>:
    1119:       55                      push   %rbp
    111a:       48 89 e5                mov    %rsp,%rbp
    111d:       c7 45 f8 00 ca 9a 3b    movl   $0x3b9aca00,-0x8(%rbp)
    1124:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
    112b:       eb 04                   jmp    1131 <not_inlined+0x18>
    112d:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
    1131:       8b 45 fc                mov    -0x4(%rbp),%eax
    1134:       39 45 f8                cmp    %eax,-0x8(%rbp)
    1137:       77 f4                   ja     112d <not_inlined+0x14>
    1139:       90                      nop
    113a:       5d                      pop    %rbp
    113b:       c3                      retq

Then I realized that addr2line is expecting the offset into the binary, so I ran addr2line -a -e ./a.out -i -f -s -C 112b and got:

0x000000000000112b
count_to
inline_test.c:3
not_inlined
inline_test.c:11

Much better! But now I don't know how to use addr2line with the addresses that are in the perf output since those aren't relative to the binary.

Any ideas?

src/collapse/perf.rs

jonhoo · 2019-02-04T15:49:30Z

So, the issue here is basically this one: ASLR + position-independent executables are the default, and that gives a random offset to all executables. I thought there was a way to work around this, but can't remember it off the top of my head. For testing, we could just compile with -static -no-pie, but we'll probably want a better solution long-term that doesn't require people to change the way they profile their apps (i.e., we don't want to require them to use something like setarch -R to run their programs).

jonhoo · 2019-02-04T16:10:56Z

FWIW, I filed #31

jasonrhansen · 2019-02-04T17:50:03Z

Thanks for the suggestion to compile with -static -no-pie, @jonhoo. I'll just do that for our test binary for now. I agree we need a better long-term solution, but I want to at least be able to test the functionality we currently have.

tests/collapse-perf-tests.rs

tests/data/results/perf-inline-counter-collapsed-inline+context.txt

jonhoo · 2019-02-05T04:32:09Z

\o/ I think those two are the last two questions!

jonhoo · 2019-02-05T16:09:48Z

\o/ Thank you so much @jasonrhansen!

This patch adds full support for the `--inline` and `--context` flags of `stackcollapse-perf` by relying on the `addr2line` crate to resolve symbol addresses instead of the names reported by perf, which _can_ lead to more informative frame names. Note that, on modern systems, address-space randomization means that we often do not have the "true" address of functions in the trace from perf script. This issue also plagues stackcollapse-perf. There are some possible workarounds (jonhoo#31), but nothing definite as of yet. For the time being we're aiming for parity. This patch also adds another set of tests that are _not_ from the upstream FrameGraph repository. Specifically, it compares the output of `stackcollapse-perf` with `--inline` and `--context` with ours on a simple binary that we've compiled and profiled without randomization. This patch _also_ adopts the `log` crate, and emits appropriate log messages whenever the input is malformed in some way. We explicitly do not return with an error, since we want to be liberal in what we accept (just like `stackcollapse-perf`).

Add un-inline support using addr2line

57d92c3

jasonrhansen force-pushed the un-inlining branch from 540068e to 57d92c3 Compare February 1, 2019 04:09

jonhoo reviewed Feb 1, 2019

View reviewed changes