Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by parsing function contents lazily #178

Merged
merged 5 commits into from
Jul 2, 2020

Conversation

philipc
Copy link
Contributor

@philipc philipc commented Jun 28, 2020

This is a significant rebase of the great work done by @mstange in #176 . Closes #176 .

The description from that PR still applies:

  • There are now two levels of function parsing laziness.
    • The first level creates only the function ranges for "outer functions" (DW_TAG_subprogram) and writes down each outer function's dw_die_offset. By necessity, when parsing the list, we need to iterate over all abbrevs and attributes of the unit, including the ones for any nested tags for inlined functions, but we try to keep the work we do for them to a minimum.
    • The second level is done once we look up an address that falls into an outer function's address range. At that point, we parse the entries for that outer function again, starting at the recorded dw_die_offset. We parse the function name and all functions that are inlined into it, with their names and other data.
  • This two-level laziness automatically makes it so that we don't parse the names of DW_TAG_subprogram instances that don't have address ranges.
  • Reduced allocations for inline functions: For nested inline functions, we no longer have a Vec of child inlines on each inline function. Instead, we now have one Vec with all descendant inline_address_ranges on the OuterFunction.
  • We do a binary search when looking up the inline functions for an address within an outer function.

Differences from that PR:

  • cleaner commit history
  • new benchmarks
  • use LazyCell instead of Rc/RefCell, since this makes the code easier to understand
  • less factoring out of attribute parsing, since this slightly affects performance
  • cosmetic changes to minimize differences to master

This PR was written by completely rewriting the commits instead of doing a normal rebase. Hopefully I haven't left out anything significant.

@mstange Could you please review this, and check against the benchmarks you were using? And thanks again for your work on this.

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 28, 2020

You can use Co-authored-by: https://help.github.com/en/github/committing-changes-to-your-project/creating-a-commit-with-multiple-authors

We previously had no clean way of indicating that there were no frames,
and this would have become harder to support for future changes.

Co-authored-by: Markus Stange <[email protected]>
@philipc philipc force-pushed the pr-176-rework branch 2 times, most recently from d50cb93 to 8789de0 Compare June 30, 2020 06:47
@mstange
Copy link
Contributor

mstange commented Jul 2, 2020

I'm reviewing this now.

Copy link
Contributor

@mstange mstange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just one or two comments that need adjusting for the different variable names.

Thanks!

src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
.as_ref()
.map_err(Error::clone)
.map_err(Error::clone)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good, this is better than what I did because you store the error, rather than re-parsing errored functions on every call.

gimli::DW_TAG_subprogram => {
Function::skip(entries, abbrev, next_depth)?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Yeah, I think you're handling this more correctly than I was. I think my PR was siphoning inlines from nested subprograms into the outer subprogram.

philipc and others added 4 commits July 2, 2020 18:04
And use only a single list of inlined function addresses per function.
This reduces the overall number of allocations, and will allow lazy
parsing of inlined functions in future.

Since we now need to search through a larger list, we must use a
binary search to retain query performance.

Co-authored-by: Markus Stange <[email protected]>
Currently the same as function parsing.
Initially only parse the function address ranges.

Delay parsing other details (name and inlined functions) until a probe
matches.

 name                                        before ns/iter  after ns/iter  diff ns/iter   diff %  speedup
 context_new_and_query_location_rc           3,289,458       2,797,820          -491,638  -14.95%   x 1.18
 context_new_and_query_location_slice        1,387,033       1,092,108          -294,925  -21.26%   x 1.27
 context_new_and_query_with_functions_rc     3,247,986       2,864,962          -383,024  -11.79%   x 1.13
 context_new_and_query_with_functions_slice  1,423,648       1,085,133          -338,515  -23.78%   x 1.31
 context_new_parse_functions_rc              53,668,984      23,596,242      -30,072,742  -56.03%   x 2.27
 context_new_parse_functions_slice           40,013,306      18,342,944      -21,670,362  -54.16%   x 2.18
 context_new_parse_inlined_functions_rc      54,060,518      53,382,830         -677,688   -1.25%   x 1.01
 context_new_parse_inlined_functions_slice   40,016,318      40,217,181          200,863    0.50%   x 1.00
 context_new_parse_lines_rc                  12,129,688      11,873,223         -256,465   -2.11%   x 1.02
 context_new_parse_lines_slice               7,494,676       7,534,801            40,125    0.54%   x 0.99
 context_new_rc                              2,304,944       2,365,795            60,851    2.64%   x 0.97
 context_new_slice                           696,658         681,423             -15,235   -2.19%   x 1.02
 context_query_location_rc                   1,034,087       1,004,797           -29,290   -2.83%   x 1.03
 context_query_location_slice                997,554         986,323             -11,231   -1.13%   x 1.01
 context_query_with_functions_rc             3,167,282       3,233,303            66,021    2.08%   x 0.98
 context_query_with_functions_slice          2,865,103       2,839,624           -25,479   -0.89%   x 1.01

Co-authored-by: Markus Stange <[email protected]>
@philipc philipc merged commit 27f1ebe into gimli-rs:master Jul 2, 2020
@philipc philipc deleted the pr-176-rework branch July 2, 2020 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants