Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling postject_find_resource() segfaults on rhel8-ppc64le #70

Closed
RaisinTen opened this issue Feb 1, 2023 · 6 comments · Fixed by #77
Closed

Calling postject_find_resource() segfaults on rhel8-ppc64le #70

RaisinTen opened this issue Feb 1, 2023 · 6 comments · Fixed by #77

Comments

@RaisinTen
Copy link
Contributor

RaisinTen commented Feb 1, 2023

The crash originates from here:

if (note->n_namesz != 0 && note->n_descsz != 0 &&

while dereferencing the note pointer. Note that note is not a null pointer here.

This is happening for the case where the resource hasn't been injected into the node binary.

This is one of the blockers for the single-executable PR in core - nodejs/node#45038. I think calling postject_has_resource() first would also unblock that PR.

Refs: nodejs/build#3168

@RaisinTen
Copy link
Contributor Author

The implementation should be something like https://github.com/iovisor/bpftrace/blob/1326f040a0f88287ccbc8c18fe8956bca4cc225d/src/utils.cpp#L1017-L1050. I'll see if I can find any obvious differences. Meanwhile maybe @dsanders11 and @robertgzr could help?

@RaisinTen
Copy link
Contributor Author

Also, cc @nodejs/single-executable if anyone else also has any clue

RaisinTen added a commit to RaisinTen/node that referenced this issue Feb 3, 2023
This also serves as a workaround for nodejs/postject#70.

Signed-off-by: Darshan Sen <[email protected]>
RaisinTen added a commit to RaisinTen/node that referenced this issue Feb 11, 2023
This also serves as a workaround for nodejs/postject#70.

Signed-off-by: Darshan Sen <[email protected]>
@dsanders11
Copy link
Contributor

The implementation should be something like https://github.com/iovisor/bpftrace/blob/1326f040a0f88287ccbc8c18fe8956bca4cc225d/src/utils.cpp#L1017-L1050. I'll see if I can find any obvious differences. Meanwhile maybe @dsanders11 and @robertgzr could help?

That implementation is a bit different - it's looking for SHT_NOTE using the section header table (SHT), while Postject's implementation uses PT_NOTE, which is a note segment. Sections are contained within segments, but they're a linker-time concept. The SHT is not used at run time and can be stripped from the executable. The the Wiki article on ELF:

The segments contain information that is needed for run time execution of the file, while sections contain important data for linking and relocation.

While SHT_NOTE sections will exist inside of a PT_NOTE segment, you can't rely on the SHT to find them at run time since that information may be stripped, so Postject walks the segments, rather than sections.


I don't see anything obvious, so I'll try to dig into this later and see what I can find. There might be some slight difference on ppc64le that's not being accounted for in the current implementation which leads to using the wrong offset for the pointers values.

RaisinTen added a commit to RaisinTen/node that referenced this issue Feb 17, 2023
This also serves as a workaround for nodejs/postject#70.

Signed-off-by: Darshan Sen <[email protected]>
@RaisinTen
Copy link
Contributor Author

RaisinTen commented Feb 23, 2023

Hmm, weird find - I'm able to reproduce this on Linux when I compile this on an x64 Ubuntu Linux:

test.cc
#include <iostream>
#include <string>

#include "postject-api.h"

int main() {
  size_t size = 0;

  if (postject_has_resource()) {
    const void* ptr = postject_find_resource("foobar", &size, nullptr);
    if (ptr == NULL) {
      std::cerr << "ptr must not be NULL." << std::endl;
      exit(1);
    }
    if (size == 0) {
      std::cerr << "size must not be 0." << std::endl;
      exit(1);
    }
    std::cout << std::string(static_cast<const char*>(ptr), size) << std::endl;
  } else {
    const void* ptr = postject_find_resource("foobar", &size, nullptr); // <- this call segfaults
    if (ptr != nullptr) {
      std::cerr << "ptr must be nullptr." << std::endl;
      exit(1);
    }
    if (size > 0) {
      std::cerr << "size must not be greater than 0." << std::endl;
      exit(1);
    }
    std::cout << "Hello world" << std::endl;
  }

  return 0;
}

(postject-api.h - from https://github.com/nodejs/postject/blob/35343439cac8c488f2596d7c4c1dddfec1fddcae/postject-api.h)

using clang but it works fine with gcc. 🤔

$ g++ test.cc 
$ ./a.out 
Hello world
$ clang++ test.cc 
$ ./a.out 
Segmentation fault (core dumped)
$ clang++ -g test.cc 
$ gdb -q a.out
Reading symbols from a.out...
(gdb) run
Starting program: /home/parallels/Desktop/temp/project/trash/a.out 

Program received signal SIGSEGV, Segmentation fault.
0x00000000004015f0 in postject_find_resource (name=0x402004 "foobar", size=0x7fffffffdf00, options=0x0) at ./postject-api.h:141
141	      if (note->n_namesz != 0 && note->n_descsz != 0 &&
(gdb) bt
#0  0x00000000004015f0 in postject_find_resource (name=0x402004 "foobar", size=0x7fffffffdf00, options=0x0) at ./postject-api.h:141
#1  0x0000000000401401 in main () at test.cc:21
(gdb) quit
A debugging session is active.

	Inferior 1 [process 187727] will be killed.

Quit anyway? (y or n) y

System info:

$ uname -a
Linux parallels-Parallels-Virtual-Platform 5.13.0-40-generic #45~20.04.1-Ubuntu SMP Mon Apr 4 09:38:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ clang++ --version
clang version 10.0.0-4ubuntu1 
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

@RaisinTen
Copy link
Contributor Author

FWIW, I tried using dl_iterate_phdr to implement the runtime API for Linux using https://github.com/percona/percona-server/blob/5486efdbebd4e9a6fd94af5410853137a73d551b/mysys/build_id.cc as the base and it doesn't segfault when I compile with clang++.

@dsanders11 I'll send a PR for this soon if you're not aware of anything obviously wrong with function which I haven't considered.

RaisinTen added a commit that referenced this issue Feb 27, 2023
The program headers base address values returned by `getauxval(AT_PHDR)`
and `dl_iterate_phdr()` are identical only on
`g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0`. However, the values are
totally different on `clang version 10.0.0-4ubuntu1` and
`g++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-16)`.
Since the `dl_iterate_phdr()` approach seems to work for all the 3
compilers, I think we should proceed with that.

Fixes: #70
Refs: #76
Signed-off-by: Darshan Sen <[email protected]>
@RaisinTen
Copy link
Contributor Author

Fix - #77

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants