Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add un-inline support using addr2line #14

Merged
merged 17 commits into from
Feb 5, 2019

Conversation

jasonrhansen
Copy link
Collaborator

This commit adds support for #5, and adds command line flags --inline and --context.

The problem is finding a good way to test it. The perl version has no tests for it. They have a comment that says "these are tricky since they use addr2line, whose output will vary based on the test
system's binaries and symbol tables." I agree that this makes it difficult. Any ideas on how to approach this would be helpful.

@codecov-io
Copy link

codecov-io commented Feb 1, 2019

Codecov Report

Merging #14 into master will increase coverage by 4.37%.
The diff coverage is 71.73%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #14      +/-   ##
=========================================
+ Coverage   40.22%   44.6%   +4.37%     
=========================================
  Files           7       7              
  Lines         532     630      +98     
=========================================
+ Hits          214     281      +67     
- Misses        318     349      +31
Impacted Files Coverage Δ
src/bin/collapse-perf.rs 0% <0%> (ø) ⬆️
src/collapse/perf.rs 82.57% <69.23%> (-8.18%) ⬇️
tests/collapse-perf-tests.rs 93.61% <94.73%> (+5.92%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 048990b...bce7ee0. Read the comment docs.

src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
@jonhoo
Copy link
Owner

jonhoo commented Feb 1, 2019

This is a cool change, and pretty much how I expected it to turn out. Left a bunch of inline comments. Should also be far faster than stackcollapse-perf's support for inlining, since it doesn't have to shell out for every $pc!

As for testing, I sadly don't have a good answer for that either. One way would be for us to include a binary in the repo that is a compiled C program that we also have profiles for. We'd just want to make sure the compiler does inline stuff in it, but then it should make for a decent, and consistent, test case. What do you think?

@jonhoo
Copy link
Owner

jonhoo commented Feb 1, 2019

Also, seeing all that inlining code really makes me wish that addr2line had the code from the addr2line example program available as a function directly on the library (cc @FitZen @philipc @main--).

src/collapse/perf.rs Outdated Show resolved Hide resolved
@main--
Copy link

main-- commented Feb 1, 2019

Also, seeing all that inlining code really makes me wish that addr2line had the code from the addr2line example program available as a function directly on the library

I'm not sure what you mean? From what I can see the example does only three notable things: mmap, string parsing and string printing. The mmap is is 2 lines of code plus a crate while the other two exist basically only to closely mimic binutils addr2line. It's unclear to me how this would help here.

@jonhoo
Copy link
Owner

jonhoo commented Feb 1, 2019

@main-- oh, sorry, I should have been a bit more specific. The un_inline function introduced in this PR has to:

  • open the file
  • pass it to object::File::parse
  • pass that to Context::new
  • call Context::find_frames
  • iterate over the frames (with an extra unwrap it seems)
  • tease out the function name from each frame using (somewhat simplified)
    addr2line::demangle_auto(frame.function.map(|func| func.raw_name()), func.language))

To me at least, that's very similar to the flow of examples/addr2line.rs, and it seems unfortunate to have to repeat the whole boilerplate if this is a common use-case for addr2line (looking up function stack by address). In particular, I wonder whether addr2line could provide a convenience API for "create a new Context from a file" (maybe using mmap and from_sections internally if possible), and "iterate over the functions of an address" (including demangling the raw_name). Something like Context::from_file(&File) -> Context and Context::resolve_iter(u64) -> impl Iterator<Item = Function>, where Function is a struct that holds the demangled name, file, line number, etc.

@jonhoo
Copy link
Owner

jonhoo commented Feb 1, 2019

Oh, @jasonrhansen, just making sure you saw my comment about testing.

@jonhoo
Copy link
Owner

jonhoo commented Feb 1, 2019

Also, to be clear, it's fine if you make multiple commits, and then we just squash them all at the end. Instead of force-pushing all the time :)

@jasonrhansen
Copy link
Collaborator Author

@jonhoo, I did see your comment about testing and I agree with the general approach. Are you suggesting we write our own C program to use as a test case? Do you have any ideas of something fairly simple we could use?

Also, sorry about the force-pushing. I'm about to push a large commit with all my changes. I probably should have broken it up into several commits.

@jonhoo
Copy link
Owner

jonhoo commented Feb 1, 2019

Yeah, I was thinking we'd write our own simple C program. Or a Rust program would probably also be totally fine. Something with no external dependencies like:

#[inline(always)]
fn count_to(mut n: usize) {
    while n != 0 {
        n -= 1'
    }
}

#[inline(always)]
fn double_inline() {
    count_to(1_000_000);
}

#[inline(never)]
fn not_inlined() {
    count_to(1_000_000);
}

fn main() {
    count_to(1_000_000);
    double_inline();
    not_inlined();
}

We'd do a perf record --call-graph dwarf of running that program, run perf script, and store the output of that somewhere in the repository (adjust the constants if the file is too large). Then for the test we'd compare stackcollapse-perf with us with and without --inline and --context.

@jonhoo
Copy link
Owner

jonhoo commented Feb 1, 2019

As for force-pushing, don't worry about it. Going forward though, individual commits are easier to review, and then we can just squash it all at the end if we want!

@main--
Copy link

main-- commented Feb 2, 2019

@jonhoo Oh, I see. What you're suggesting makes sense, but the reason I originally structured the API this way is to avoid pulling in heavy external dependencies that are not strictly needed. For instance memmap is obviously just one of many ways to obtain code. Then there is object which is a very convenient way to deal with binaries across platforms but by no means a hard dependency of addr2line - here Context::new already is the convenience function in the sense that it fetches and constructs all the gimli sections for you. Constructing the entire Object in addr2line however is the point where the tradeoff between convenience vs flexibility is not worth it any more imo.

Bringing a file into memory and then calling object::File::parse on it may be a common usecase but I feel like this is best served by an example to copy from rather than an actual black-box function.

The frame iteration is basically a bog-standard for loop - except it has to be a FallibleIterator because the library lazily parses the DWARF whenever necessary and this might fail (e.g. malformed DWARF). This is impossible to abstract over without sacrificing the laziness and eagerly parsing and buffering all the frames before returning initially.

The demangling is only implemented this way in the addr2line example because you can turn it off. In your example you may just use FunctionName::demangle.

The crate already does a lot in terms of abstracting a complex task into simple parts. I agree that it could use some ergonomic improvements but consider that all of these papercuts at least have a reason. For instance the Frame struct is divided between function and location because those are completely independent in DWARF - one may exist while the other does not.

src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
src/collapse/perf.rs Outdated Show resolved Hide resolved
@jonhoo
Copy link
Owner

jonhoo commented Feb 2, 2019

We've moved the discussion with @main-- to gimli-rs/addr2line#110 (comment) so we can keep this PR to the point.

@jasonrhansen
Copy link
Collaborator Author

I finally started trying to test this, but I'm not sure how to proceed. I wrote the following simple program in C.

inline void __attribute__((always_inline)) count_to(unsigned int n) {
    for(int i = 0; i < n; i++);
}

inline void __attribute__((always_inline)) double_inline() {
    count_to(1000000000);
}

static void __attribute__ ((noinline)) not_inlined() {
    count_to(1000000000);
}

int main() {
    count_to(1000000000);
    double_inline();
    not_inlined();
}

Then did the following:

gcc -g inline_test.c
perf record --call-graph dwarf ./a.out
perf script > perfoutput

The problem I'm having is the --inline option doesn't seem to work with stackcollapse-perf.pl, or our implementation.

stackcollapse-perf.pl --inline perfoutput:

a.out;??;??;?? 3767
a.out;??;??;??;?? 1930
a.out;??;??;??;??;?? 1

cargo run --bin inferno-collapse-perf -- perfoutput --inline:

a.out 5698

So then I decided to try addr2line directly and took the address from this line 55ca69b98131 not_inlined+0x18 (/home/jrhansen/projects/rust/inferno/tests/a.out) in perfoutput to see if addr2line could tell me the file and function name.
I ran addr2line -a -e ./a.out -i -f -s -C 55ca69b98131
But this was the output:

0x000055ca69b98131
??
??:0

Next I ran objdump -d a.out. Here's the the section of the output for not_inlined:

0000000000001119 <not_inlined>:
    1119:       55                      push   %rbp
    111a:       48 89 e5                mov    %rsp,%rbp
    111d:       c7 45 f8 00 ca 9a 3b    movl   $0x3b9aca00,-0x8(%rbp)
    1124:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
    112b:       eb 04                   jmp    1131 <not_inlined+0x18>
    112d:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
    1131:       8b 45 fc                mov    -0x4(%rbp),%eax
    1134:       39 45 f8                cmp    %eax,-0x8(%rbp)
    1137:       77 f4                   ja     112d <not_inlined+0x14>
    1139:       90                      nop
    113a:       5d                      pop    %rbp
    113b:       c3                      retq

Then I realized that addr2line is expecting the offset into the binary, so I ran addr2line -a -e ./a.out -i -f -s -C 112b and got:

0x000000000000112b
count_to
inline_test.c:3
not_inlined
inline_test.c:11

Much better! But now I don't know how to use addr2line with the addresses that are in the perf output since those aren't relative to the binary.

Any ideas?

@jonhoo
Copy link
Owner

jonhoo commented Feb 4, 2019

So, the issue here is basically this one: ASLR + position-independent executables are the default, and that gives a random offset to all executables. I thought there was a way to work around this, but can't remember it off the top of my head. For testing, we could just compile with -static -no-pie, but we'll probably want a better solution long-term that doesn't require people to change the way they profile their apps (i.e., we don't want to require them to use something like setarch -R to run their programs).

@jonhoo
Copy link
Owner

jonhoo commented Feb 4, 2019

FWIW, I filed #31

@jasonrhansen
Copy link
Collaborator Author

Thanks for the suggestion to compile with -static -no-pie, @jonhoo. I'll just do that for our test binary for now. I agree we need a better long-term solution, but I want to at least be able to test the functionality we currently have.

tests/collapse-perf-tests.rs Outdated Show resolved Hide resolved
@jonhoo
Copy link
Owner

jonhoo commented Feb 5, 2019

\o/ I think those two are the last two questions!

@jonhoo jonhoo merged commit f15e8ca into jonhoo:master Feb 5, 2019
@jonhoo
Copy link
Owner

jonhoo commented Feb 5, 2019

\o/ Thank you so much @jasonrhansen!

@jonhoo jonhoo mentioned this pull request Feb 5, 2019
JordiChauzi pushed a commit to JordiChauzi/inferno that referenced this pull request Feb 10, 2019
This patch adds full support for the `--inline` and `--context` flags of
`stackcollapse-perf` by relying on the `addr2line` crate to resolve symbol
addresses instead of the names reported by perf, which _can_ lead to more
informative frame names.

Note that, on modern systems, address-space randomization means that we often
do not have the "true" address of functions in the trace from perf script. This
issue also plagues stackcollapse-perf. There are some possible workarounds
(jonhoo#31), but nothing definite as of yet. For the time being we're aiming for
parity.

This patch also adds another set of tests that are _not_ from the upstream
FrameGraph repository. Specifically, it compares the output of
`stackcollapse-perf` with `--inline` and `--context` with ours on a simple
binary that we've compiled and profiled without randomization.

This patch _also_ adopts the `log` crate, and emits appropriate log messages
whenever the input is malformed in some way. We explicitly do not return with
an error, since we want to be liberal in what we accept (just like
`stackcollapse-perf`).
JordiChauzi pushed a commit to JordiChauzi/inferno that referenced this pull request Feb 10, 2019
This patch adds full support for the `--inline` and `--context` flags of
`stackcollapse-perf` by relying on the `addr2line` crate to resolve symbol
addresses instead of the names reported by perf, which _can_ lead to more
informative frame names.

Note that, on modern systems, address-space randomization means that we often
do not have the "true" address of functions in the trace from perf script. This
issue also plagues stackcollapse-perf. There are some possible workarounds
(jonhoo#31), but nothing definite as of yet. For the time being we're aiming for
parity.

This patch also adds another set of tests that are _not_ from the upstream
FrameGraph repository. Specifically, it compares the output of
`stackcollapse-perf` with `--inline` and `--context` with ours on a simple
binary that we've compiled and profiled without randomization.

This patch _also_ adopts the `log` crate, and emits appropriate log messages
whenever the input is malformed in some way. We explicitly do not return with
an error, since we want to be liberal in what we accept (just like
`stackcollapse-perf`).
@jasonrhansen jasonrhansen deleted the un-inlining branch February 12, 2019 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants