-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache function attributes #152
Conversation
Thanks for this! I suspect that the difference between the benchmarks here and in the backtrace crate may be due to usage? With caching it definitely makes sense that parsing is a bit slower (since it has to build a cache), but the backtrace crate's use case is that it's repeatedly using the same pre-parsed information which would easily benefit from faster caches. In that sense perhaps landing this depends on the envisioned use cases for addr2line? If it's expected to just be "read once and move on" then this wouldn't be a great change to land, but if it's intended to be more backtrace-crate-like where you parse once and can read many times, then this may make more sense? If there's various config options I don't mind tweaking integration in the backtrace crate as well! |
(Have not looked at the code yet)
I've been thinking in the back of my mind that our addr2line benchmarks are measuring the wrong thing, or at least not a common use case (that backtrace-rs has). The addr2line benchmarks, last I looked at them in detail, were symbolicating every address in a bunch of functions once, in order, and never re-symbolicating any of those addresses again. That is something that some binary analysis tool or something might do, but the other end of the spectrum is a sampling profiler, where the young frames are changing at a fairly frequent rate, but the older frames are most often the same. So in this scenario we're repeatedly symbolicating the same functions, and caching should be a huge win. I think the way that backtrace-rs is commonly used falls somewhere in between those two ends of the spectrum, but probably most often is used to capture a single backtrace for an error, and then that's it. This use case probably is best served by prioritizing a lower memory footprint over symbolication-of-addresses-we've-seen-before speed. Although I also haven't looked at the benchmarks in backtrace-rs, and have no idea if they are representative of its common uses. One more thought: if we need a fast, LRU associative cache, I happen to have one right here: https://github.com/fitzgen/associative-cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if, rather than having every file/line/column be cache-able in an Option
, we had a fixed-size LRU cache in front of the symbolication (either in this crate, as a new struct/method, or in consumer crates). This would let us balance the memory overhead to whatever is appropriate for the given use case, and I think should be faster as well, since the do-we-have-it-cached-or-not check only happens once at the outer boundary, rather than in the middle of the symbolication loop. But also maybe I'm not reading closely enough to what is happening here (and I wouldn't say I'm super up to date with this code anymroe) so take what I am saying with a pinch of salt...
I've tried to isolate various parts of the code in different benchmarks. So for the benchmarks most relevant to this PR:
The benchmarks in backtrace-rs are making a single trace (= a small number of addresses) while reusing the global cache, so this is effectively benchmarking querying only, with parsing being mostly ignored since it will only occur for the first iteration of the benchmark loop. I'm actually surprised that backtrace-rs saw an improvement but
Maybe I misunderstand, but we don't really cache file/line/column in Options. Prior to this PR we never cache them, and after this PR the caching is at the compilation unit level. The Option being used for file/line/column only signifies the existence of these in the DWARF, but I changed that in this PR to reduce the memory size, since DWARF never uses 0. Maybe backtrace-rs already does have a simple LRU at the file level ( As for intended usage of addr2line... I'm not using it anywhere, so I don't know :) Looking at crates.io, there is backtrace-rs and py-spy (a sampling profiler). I also know of not-perf (a sampling profiler). |
FWIW I would definitely take the |
Reduces memory usage.
8842c95
to
b6d719b
Compare
I've figured out the problem with the addr2line benchmarks and updated this PR with the fix. The new benchmarks look better, still slower for initial parsing as expected, but much better for repeated queries:
@fitzgen I'm happier to merge this now, if you want to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I'm not sure about this change, and I'm inclined to not merge it for now. While it does show benefits for the backtrace-rs benchmarks, it doesn't for the addr2line benchmarks, and it significantly slows down the parsing and increases the memory usage.
Memory usage for the test
parse_functions_slice
increased from 19.7MB to 24.0MB.In order to lower the increase in memory usage, this PR also changes lines and columns to
u32
, which I think is fine because that's a reasonable limit and it's all backtrace-rs uses anyway. This change is why the lines benchmarks improved above, and it's probably worth merging this part at least.cc @alexcrichton