Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Data Structure for Source Lines to replace SDB #880

Merged
merged 5 commits into from
Mar 30, 2021
Merged

Conversation

thestr4ng3r
Copy link
Member

@thestr4ng3r thestr4ng3r commented Mar 20, 2021

DO NOT SQUASH

Fix #843

Overview:

  • Both RzBinSourceRow and sdb_addrinfo are removed. Instead:
  • Added a new data structure RzBinSourceLineInfo for holding a large amount of addr+line+column+filename samples in a memory-efficient way with O(log n) access by address.
  • An RzBinSourceLineInfo can be conveniently constructed using the RzBinSourceLineInfoBuilder by pushing samples in an arbitrary order into it and the builder will take care of creating a valid RzBinSourceLineInfo in a final step.
  • Such an RzBinSourceLineInfo object is part of RzBinObject next to symbols, sections, etc. and can be in a bin plugin's lines function, as it is done for dex and CoreSymbolication.
  • CL which wasn't really using meta but just messing around with the sdb_addrinfo has been removed.
  • ix has been added (iX removed and integrated here):
    Usage: ix   Display source file line info (from debug info)
    | ix[j]   List all source line info available (this was CL before)
    | ix.[j]  Show source line info for current address (this was CL. before)
    | ixf[j]  Show summary of all source files used (this was iX before)
    

For #676, this reduces the load time for me from ~1m to ~45s.
Memory usage (from massif) after loading this binary:
Before:
20210325_21h14m23s_grim
After:
20210325_21h13m48s_grim
So we get ~180MB vs. ~540MB. Getting lower than this is probably impossible while maintaining efficient access.

@thestr4ng3r thestr4ng3r force-pushed the refactor-debug-lines branch 3 times, most recently from 02d9f08 to ecd8f53 Compare March 21, 2021 20:22
@XVilka XVilka added the DWARF label Mar 22, 2021
@XVilka
Copy link
Member

XVilka commented Mar 22, 2021

Would be nice to reuse it also for PDB too.

@thestr4ng3r
Copy link
Member Author

thestr4ng3r commented Mar 22, 2021

Yes, it should be used for everything eventually. But actually I am not even sure if pdb right now supports line info at all.

@thestr4ng3r thestr4ng3r force-pushed the refactor-debug-lines branch 5 times, most recently from 8cbc4f0 to f4b2abd Compare March 23, 2021 21:23
@thestr4ng3r thestr4ng3r force-pushed the refactor-debug-lines branch from f4b2abd to 9d3e28c Compare March 24, 2021 21:02
@thestr4ng3r thestr4ng3r force-pushed the refactor-debug-lines branch 4 times, most recently from 3f0221e to 132dd38 Compare March 25, 2021 19:26
@thestr4ng3r thestr4ng3r marked this pull request as ready for review March 25, 2021 20:18
@thestr4ng3r thestr4ng3r force-pushed the refactor-debug-lines branch 2 times, most recently from 0431cd7 to d7cbb2d Compare March 25, 2021 20:45
librz/bin/bfile.c Outdated Show resolved Hide resolved
@XVilka XVilka added this to the 0.2.0 milestone Mar 26, 2021
Copy link
Member

@XVilka XVilka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through and it's a good quality code! 👍
While I think the API should be more abstracted from the fact that we use DWARF, PDB, or dlSYM, it's already a huge improvement and we can merge it right away, imho. Further improvements could be done as a separate issues/PRs

@XVilka
Copy link
Member

XVilka commented Mar 26, 2021

I tried this branch on rizin itself build with debug information (rizin $(which rizin) then aaa; s main; Vp):

image

These rizin.c34rizin.c strings do not look very good. Could you please address this problem too?

@thestr4ng3r
Copy link
Member Author

These rizin.c34rizin.c strings do not look very good. Could you please address this problem too?

Yes. The main problem here is that we don't have any tests for this (yet) as it depends on your local filesystem contents.

@thestr4ng3r
Copy link
Member Author

While I think the API should be more abstracted from the fact that we use DWARF, PDB, or dlSYM,...

Added detailed info about going more format-agnostic here: #907 (comment)

librz/bin/dbginfo.c Outdated Show resolved Hide resolved
@thestr4ng3r thestr4ng3r force-pushed the refactor-debug-lines branch from d7cbb2d to 1d40ebd Compare March 26, 2021 09:48
@thestr4ng3r
Copy link
Member Author

line - An unsigned integer indicating a source line number. Lines are numbered beginning at 1. The compiler may emit the value 0 in cases where an instruction cannot be attributed to any source line.

This isn't handled nicely yet, I have to fix it.

@thestr4ng3r thestr4ng3r force-pushed the refactor-debug-lines branch from 1d40ebd to 9ac0ca0 Compare March 26, 2021 10:51
@thestr4ng3r
Copy link
Member Author

All fixed, including the ugly rizin.c34rizin.c comments.

@XVilka
Copy link
Member

XVilka commented Mar 26, 2021

Two failing tests:



[XX] db/formats/mach0/coresymbolication Xcode symbols cache new (line numbers)
RZ_NOPLUGINS=1 rizin -escr.utf8=0 -escr.color=0 -escr.interactive=0 -N -Qc 'ix
?e --
ix.@`is~viewDidLoad[2]`
ix.@0x100006430
' bins/mach0/D9A37B67-F10A-35AA-A852-ABFBBECC03AE-new.symbols
-- stdout
@@ -5,7 +5,6 @@
 0x100006404	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	27
 0x100006420	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	30
 0x100006430	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	30
-0x100006440	-	-
 0x100006440	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	-
 0x100006444	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	30
 0x100006454	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	30
@@ -30,6 +29,7 @@
 0x100006528	/Users/ftamagni/src/TestRTTI/TestRTTI/SceneDelegate.m	51
 0x10000652c	/Users/ftamagni/src/TestRTTI/TestRTTI/SceneDelegate.h	13
 0x10000653c	/Users/ftamagni/src/TestRTTI/TestRTTI/SceneDelegate.h	13
+0x100006540	-	-
 0x100006540	/Users/ftamagni/src/TestRTTI/TestRTTI/SceneDelegate.h	-
 0x100006550	/Users/ftamagni/src/TestRTTI/TestRTTI/SceneDelegate.m	15
 0x100006564	-	-




[XX] db/formats/mach0/coresymbolication Xcode symbols cache (line numbers)
RZ_NOPLUGINS=1 rizin -escr.utf8=0 -escr.color=0 -escr.interactive=0 -N -Qc 'ix
?e --
ix.@`is~viewDidLoad[2]`
ix.@0x100006430
' bins/mach0/D9A37B67-F10A-35AA-A852-ABFBBECC03AE.symbols
-- stdout
@@ -5,7 +5,6 @@
 0x100006404	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	27
 0x100006420	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	30
 0x100006430	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	30
-0x100006440	-	-
 0x100006440	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	-
 0x100006444	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	30
 0x100006454	/Users/ftamagni/src/TestRTTI/TestRTTI/AppDelegate.m	30
@@ -30,6 +29,7 @@
 0x100006528	/Users/ftamagni/src/TestRTTI/TestRTTI/SceneDelegate.m	55
 0x10000652c	/Users/ftamagni/src/TestRTTI/TestRTTI/SceneDelegate.h	13
 0x10000653c	/Users/ftamagni/src/TestRTTI/TestRTTI/SceneDelegate.h	13
+0x100006540	-	-
 0x100006540	/Users/ftamagni/src/TestRTTI/TestRTTI/SceneDelegate.h	-
 0x100006550	/Users/ftamagni/src/TestRTTI/TestRTTI/SceneDelegate.m	15
 0x100006564	-	-

@thestr4ng3r thestr4ng3r force-pushed the refactor-debug-lines branch from 9ac0ca0 to f6b0f04 Compare March 26, 2021 12:39
@thestr4ng3r
Copy link
Member Author

Two failing tests:

Fixed and updated unit tests to catch this.

@thestr4ng3r thestr4ng3r requested a review from ret2libc March 27, 2021 09:29
Copy link
Member

@ret2libc ret2libc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't review everything :P I hope it won't break anything. I couldn't see the code coverage for this branch on codecov though. It would help to see which part are not covered at all by tests.

@@ -939,14 +947,18 @@ RZ_API void rz_bin_dwarf_loc_free(HtUP /*<offset, RzBinDwarfLocList*>*/ *loc_tab
RZ_API void rz_bin_dwarf_debug_info_free(RzBinDwarfDebugInfo *inf);
RZ_API void rz_bin_dwarf_debug_abbrev_free(RzBinDwarfDebugAbbrev *da);

RZ_API RzList *rz_bin_dwarf_parse_line(RzBinFile *binfile, RZ_NULLABLE RzBinDwarfDebugInfo *info, RzBinDwarfLineInfoMask mask);
RZ_API char *rz_bin_dwarf_line_header_get_full_file_path(RZ_NULLABLE RzBinDwarfDebugInfo *info, const RzBinDwarfLineHeader *header, ut64 file_index);
typedef char **RzBinDwarfLineFileCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this represent? A small doxygen-doc to explain it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets merge this and test a bit before the release, not to be untested. I think @thestr4ng3r could add a comment in a separate commit.

@XVilka XVilka merged commit 63891a1 into dev Mar 30, 2021
@XVilka XVilka deleted the refactor-debug-lines branch March 30, 2021 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor RzBinFile.sdb_addrinfo out of Sdb
4 participants