Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbols are being double counted. #962

Closed
fzakaria opened this issue Sep 10, 2023 · 9 comments
Closed

Symbols are being double counted. #962

fzakaria opened this issue Sep 10, 2023 · 9 comments
Assignees

Comments

@fzakaria
Copy link
Contributor

fzakaria commented Sep 10, 2023

Describe the bug
Here is a sample Makefile:

all: exe

# This library depends libx2.so soname and calls h() from it
y/liby.so: x/libx2.so
	@mkdir -p $(dir $@)
	echo 'extern int h(); int g() { return h(); }' | $(CC) -o $@ -shared -x c - -Lx -l:libx2.so '-Wl,--no-as-needed,--enable-new-dtags,-rpath,$$ORIGIN/../x'

# This library has both file and soname libx.so
x/libx.so:
	@mkdir -p $(dir $@)
	echo 'int h(){return 12;}' | $(CC) -o $@ -shared -x c -

# This library has both file and soname libx.so
x/libx2.so:
	@mkdir -p $(dir $@)
	echo 'int h(){return 1000;}' | $(CC) -o $@ -shared -x c -

# This links to b/liby.so and c/libx.so, and gets libx.so and liby.so in DT_NEEDED, no paths.
exe: y/liby.so x/libx.so
	echo 'extern int g(); extern int h(); int main(){ printf("\%d\n", g() + h()); }' | \
	$(CC) -o $@ -include stdio.h -x c - -Ly -Lx -l:liby.so '-Wl,--no-as-needed,--enable-new-dtags,-rpath,$$ORIGIN/y' \
		  -l:libx.so '-Wl,--no-as-needed,--enable-new-dtags,-rpath,$$ORIGIN/x'

clean:
	rm -rf -- x y exe

When I try to load one of the shared objects (libx2.so) this into lief it is double counting the symbols and I'm not sure why.
You can see here that I'm using inspect.getmembers to try and find the difference between them but they look identical.

>>> import lief
>>> binary = lief.parse('/tmp/example/x/libx2.so')
>>> symbols = [symbol for symbol in binary.symbols]
>>> h_syms = list(filter(lambda sym: sym.name == 'h', symbols))
>>> len(h_syms)
2

>>> import inspect
>>> inspect.getmembers(h_syms[0])
[('__class__', <class 'lief._lief.ELF.Symbol'>), ('__delattr__', <method-wrapper '__delattr__' of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__dir__', <built-in method __dir__ of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__doc__', '\n    "Class which represents an ELF symbol"\n    '), ('__eq__', <bound method PyCapsule.__eq__ of <lief._lief.ELF.Symbol object at 0x7f40d417ebf0>>), ('__format__', <built-in method __format__ of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__ge__', <method-wrapper '__ge__' of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__getattribute__', <method-wrapper '__getattribute__' of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__gt__', <method-wrapper '__gt__' of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__hash__', <bound method PyCapsule.__hash__ of <lief._lief.ELF.Symbol object at 0x7f40d417ebf0>>), ('__init__', <bound method PyCapsule.__init__ of <lief._lief.ELF.Symbol object at 0x7f40d417ebf0>>), ('__init_subclass__', <built-in method __init_subclass__ of pybind11_type object at 0x1c7f100>), ('__le__', <method-wrapper '__le__' of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__lt__', <method-wrapper '__lt__' of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__module__', 'lief._lief.ELF'), ('__ne__', <bound method PyCapsule.__ne__ of <lief._lief.ELF.Symbol object at 0x7f40d417ebf0>>), ('__new__', <built-in method __new__ of pybind11_type object at 0x1b95680>), ('__reduce__', <built-in method __reduce__ of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__reduce_ex__', <built-in method __reduce_ex__ of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__repr__', <method-wrapper '__repr__' of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__setattr__', <method-wrapper '__setattr__' of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__sizeof__', <built-in method __sizeof__ of lief._lief.ELF.Symbol object at 0x7f40d417ebf0>), ('__str__', <bound method PyCapsule.__str__ of <lief._lief.ELF.Symbol object at 0x7f40d417ebf0>>), ('__subclasshook__', <built-in method __subclasshook__ of pybind11_type object at 0x1c7f100>), ('binding', <SYMBOL_BINDINGS.GLOBAL: 1>), ('demangled_name', 'unsigned char'), ('exported', True), ('has_version', False), ('imported', False), ('information', 18), ('is_function', True), ('is_static', True), ('is_variable', False), ('name', 'h'), ('other', 0), ('section', None), ('shndx', 9), ('size', 11), ('symbol_version', None), ('type', <SYMBOL_TYPES.FUNC: 2>), ('value', 4345), ('visibility', <SYMBOL_VISIBILITY.DEFAULT: 0>)]
>>> inspect.getmembers(h_syms[1])
[('__class__', <class 'lief._lief.ELF.Symbol'>), ('__delattr__', <method-wrapper '__delattr__' of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__dir__', <built-in method __dir__ of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__doc__', '\n    "Class which represents an ELF symbol"\n    '), ('__eq__', <bound method PyCapsule.__eq__ of <lief._lief.ELF.Symbol object at 0x7f40d4f02530>>), ('__format__', <built-in method __format__ of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__ge__', <method-wrapper '__ge__' of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__getattribute__', <method-wrapper '__getattribute__' of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__gt__', <method-wrapper '__gt__' of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__hash__', <bound method PyCapsule.__hash__ of <lief._lief.ELF.Symbol object at 0x7f40d4f02530>>), ('__init__', <bound method PyCapsule.__init__ of <lief._lief.ELF.Symbol object at 0x7f40d4f02530>>), ('__init_subclass__', <built-in method __init_subclass__ of pybind11_type object at 0x1c7f100>), ('__le__', <method-wrapper '__le__' of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__lt__', <method-wrapper '__lt__' of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__module__', 'lief._lief.ELF'), ('__ne__', <bound method PyCapsule.__ne__ of <lief._lief.ELF.Symbol object at 0x7f40d4f02530>>), ('__new__', <built-in method __new__ of pybind11_type object at 0x1b95680>), ('__reduce__', <built-in method __reduce__ of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__reduce_ex__', <built-in method __reduce_ex__ of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__repr__', <method-wrapper '__repr__' of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__setattr__', <method-wrapper '__setattr__' of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__sizeof__', <built-in method __sizeof__ of lief._lief.ELF.Symbol object at 0x7f40d4f02530>), ('__str__', <bound method PyCapsule.__str__ of <lief._lief.ELF.Symbol object at 0x7f40d4f02530>>), ('__subclasshook__', <built-in method __subclasshook__ of pybind11_type object at 0x1c7f100>), ('binding', <SYMBOL_BINDINGS.GLOBAL: 1>), ('demangled_name', 'unsigned char'), ('exported', True), ('has_version', False), ('imported', False), ('information', 18), ('is_function', True), ('is_static', True), ('is_variable', False), ('name', 'h'), ('other', 0), ('section', <lief._lief.ELF.Section object at 0x7f40d4ec6fb0>), ('shndx', 9), ('size', 11), ('symbol_version', None), ('type', <SYMBOL_TYPES.FUNC: 2>), ('value', 4345), ('visibility', <SYMBOL_VISIBILITY.DEFAULT: 0>)]

>>> h_syms[0].__str__()
'unsigned char                 FUNC      GLOBAL    10f9      b         '
>>> h_syms[1].__str__()
'unsigned char                 FUNC      GLOBAL    10f9      b         '

Here is the output of nm:

❯ nm -g /tmp/example/x/libx2.so
                 w __cxa_finalize
                 w __gmon_start__
00000000000010f9 T h
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable

To Reproduce
Run the makefile provided and inspect.

Expected behavior
There should only be a single h

Environment (please complete the following information):

  • System and Version : [e.g. Ubuntu 16.04]
  • Target format ELF
  • LIEF commit version: python -c "import lief;print(lief.__version__)" or the one from LIEF/config.h
❯ python -c "import lief;print(lief.__version__)"
0.13.2-2d9855fc
@fzakaria
Copy link
Contributor Author

fzakaria commented Sep 10, 2023

Okay I found the delta but I think it's still a bug.
I uploaded the diff: https://www.diffchecker.com/OztiUR4A

image

Looks like shndx is the same for both but not the section property.
Not sure why this causes two distinct symbols in the list.

@romainthomas
Copy link
Member

Actually you have two symbol tables in your binaries: .dynsym / .symtab.
As stated in the documentation: Return an iterator over both static and dynamic Symbol

So yes, you can observe twice the same symbol because of these two tables. You can use
the .static_symbols or .dynamic_symbols depending on your need.

@fzakaria
Copy link
Contributor Author

ok that makes sense.
I will need to make sure I have the column is_static in my object.

@romainthomas but both symbols have is_static: True (see the diff output)

property is_static
True if the symbol is a static one (i.e. from the .symtab section

One of them should be False correct?

@fzakaria
Copy link
Contributor Author

Adding more debugging. Here is the output stripped from readelf:

Symbol table '.dynsym' contains 6 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     5: 00000000000010f9    11 FUNC    GLOBAL DEFAULT    9 h

Symbol table '.symtab' contains 25 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
    23: 00000000000010f9    11 FUNC    GLOBAL DEFAULT    9 h

I can see both have the binding GLOBAL which in the code via

return this->binding() == SYMBOL_BINDINGS::STB_GLOBAL;
has it set to is_static being True.

  1. Should is_static better reference if it actually came form the .dynsym ?
  2. Why does one of the symbols not have the section property set but the other does? (might be related to section() in ELF/Symbols.cpp never set #959 didn't try the latest main)

@romainthomas
Copy link
Member

There is no symbol attributes to distinguish .dynsym vs .symtab. Both tables use the same raw C structures.
The only way is to iterate over the mentioned table separately.

@romainthomas romainthomas closed this as not planned Won't fix, can't repro, duplicate, stale Sep 10, 2023
@romainthomas
Copy link
Member

@romainthomas but both symbols have is_static: True (see the diff output)

is_static refers to the static C-visibility

@romainthomas
Copy link
Member

Why does one of the symbols not have the section property set but the other does? (might be related to #959 didn't try the latest main)

h = elf.get_dynamic_symbol("h")
print(h.section)

h = elf.get_static_symbol("h")
print(h.section)

both are set

@fzakaria
Copy link
Contributor Author

fzakaria commented Sep 10, 2023

@romainthomas can we keep this open? There is some bug I believe.

Here you can see that if I get the symbols via .symbols property it resolves differently.

❯ python
Python 3.10.12 (main, Jun  6 2023, 22:43:10) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lief
>>> binary = lief.parse("/tmp/example/x/libx2.so")
>>> foo_syms = list(filter(lambda sym: sym.name == "h", binary.symbols))
>>> foo_syms[0].section
>>> foo_syms[1].section
<lief._lief.ELF.Section object at 0x7f5867ee1df0>
>>> foo_syms[0].section == None
True

The comment may need to change to just reflect binding?
image

For instance, I am debugging a file with no .symtab (it was stripped out) yet in the .dynsym the value of the binding is GLOBAL so it gets set to static = True

I will use the property access method itself to distinguish them as well.

Edit:
Since dynsym is just a smaller subset of the entries in the main symbol table (subset) -- my code will check if .symtab is present and use that ONLY otherwise use .dynsym -- so basically try to select the larger set first.

Edit2:
I will work on moving my code to latest main also -- to validate that this is present and unrelated to the other bug.

Thank you for your responses.

@romainthomas
Copy link
Member

The comment may need to change to just reflect binding?

Indeed I'll fix it.

Here you can see that if I get the symbols via .symbols property it resolves differently.

I think you are not using the version with the previous fix (8ef47cd)
You can check that your version is after or equal to this commit (lief.__version__)
On my end:

import lief
binary = lief.parse("./x/libx.so")
foo_syms = list(filter(lambda sym: sym.name == "h", binary.symbols))
print(foo_syms[0].section.name)
.text

Since dynsym is just a smaller subset of the entries in the main symbol table (subset) -- my code will check if .symtab is present and use that ONLY otherwise use .dynsym -- so basically try to select the larger set first.

Make sure to understand the difference between .dynsym and .symtab. The reliable way is .dynsm, .symtab cannot be trusted on weird binaries. If your inputs ELF are trustable yes, .symtab can be a good option.

romainthomas added a commit that referenced this issue Sep 10, 2023
fzakaria added a commit to fzakaria/sqlelf that referenced this issue Sep 11, 2023
As a response to lief-project/LIEF#962 only
iterate static or dynamic symbols.

Added a simple Makefile to demonstrate shadowing variables.
fzakaria added a commit to fzakaria/sqlelf that referenced this issue Sep 11, 2023
As a response to lief-project/LIEF#962 only
iterate static or dynamic symbols.

Added a simple Makefile to demonstrate shadowing variables.
fzakaria added a commit to fzakaria/sqlelf that referenced this issue Sep 11, 2023
As a response to lief-project/LIEF#962 only
iterate static or dynamic symbols.

Added a simple Makefile to demonstrate shadowing variables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants