Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault #1942

Open
honggyukim opened this issue Jul 17, 2024 · 8 comments
Open

Segmentation fault #1942

honggyukim opened this issue Jul 17, 2024 · 8 comments
Labels

Comments

@honggyukim
Copy link
Collaborator

honggyukim commented Jul 17, 2024

The segfault can be reproduced as follows.

$ git clone https://github.com/MatthewKhouzam/sum_of_squares.git
$ cd sum_of_squares
$ git checkout 0debd1aa615e5ff0dabc4d4631628b08b91701cf

$ gcc -pthread  -Wall -O2 -o sum_of_squares-O2-pg  sum_of_squares.c -pg

$ uftrace record ./sum_of_squares-O2-pg 
WARN: Segmentation fault: address not mapped (addr: 0x6697cd78)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.16-11-g804a ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [0] (compute_sum_of_squares[5884b27224c5] <= start_thread[73881589ca94])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault
@honggyukim honggyukim added the bug label Jul 17, 2024
@honggyukim
Copy link
Collaborator Author

More debugging information is here.

$ gdb -q --args uftrace record ./sum_of_squares-O2-pg 
Reading symbols from uftrace...
(gdb) r
Starting program: /home/honggyu/usr/bin/uftrace record ./sum_of_squares-O2-pg
        ...
Thread 2.2 "sum_of_squares-" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6e006c0 (LWP 10869)]
0x00007ffff7e5a49f in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
(gdb) bt
#0  0x00007ffff7e5a49f in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#1  0x00007ffff7e5b6aa in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#2  0x00007ffff7e5bde0 in _Unwind_ForcedUnwind () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3  0x00007ffff7ca5794 in __GI___pthread_unwind (buf=<optimized out>) at ./nptl/unwind.c:130
#4  0x00007ffff7c9dd12 in __do_cancel () at ../sysdeps/nptl/pthreadP.h:271
#5  __GI___pthread_exit (value=0x0) at ./nptl/pthread_exit.c:36
#6  0x00007ffff7f7d29a in pthread_exit (retval=0x0) at /home/honggyu/work/uftrace/libmcount/wrap.c:529
#7  0x00007ffff7f9f53d in plt_hooker () at /home/honggyu/work/uftrace/arch/x86_64/plthook.S:68
#8  0x000000006697cf0b in ?? ()
#9  0x0000000000011183 in ?? ()
#10 0x000000006697cf0c in ?? ()
#11 0x000000000007f7d0 in ?? ()
#12 0x0000000000000000 in ?? ()

(gdb) frame 6
#6  0x00007ffff7f7d29a in pthread_exit (retval=0x0) at /home/honggyu/work/uftrace/libmcount/wrap.c:529
529             real_pthread_exit(retval);

(gdb) up
#7  0x00007ffff7f9f53d in plt_hooker () at /home/honggyu/work/uftrace/arch/x86_64/plthook.S:68
68              jmp *%r11

@honggyukim
Copy link
Collaborator Author

honggyukim commented Jul 17, 2024

Reported by @MatthewKhouzam

@yskelg
Copy link
Contributor

yskelg commented Jul 18, 2024

Solution 1

Simply resolve the issue, removing pthread_exit from resolve_syms[] table in libmcount/plthook.c.

diff --git a/libmcount/plthook.c b/libmcount/plthook.c
index 8c8ba7ea..82b95b2c 100644
--- a/libmcount/plthook.c
+++ b/libmcount/plthook.c
@@ -452,7 +452,7 @@ static const char *except_syms[] = {
 
 static const char *resolve_syms[] = {
        "execl",   "execlp",  "execle",      "execv",        "execve",       "execvp",
-       "execvpe", "fexecve", "posix_spawn", "posix_spawnp", "pthread_exit",
+       "execvpe", "fexecve", "posix_spawn", "posix_spawnp",
 };

Solution 2

This solution makes not reference the rstack value of the mtdp.

diff --git a/libmcount/wrap.c b/libmcount/wrap.c
index c52e91a1..c119b4cc 100644
--- a/libmcount/wrap.c
+++ b/libmcount/wrap.c
@@ -504,23 +504,21 @@ __visible_default void *dlopen(const char *filename, int flags)
 __visible_default __noreturn void pthread_exit(void *retval)
 {
        struct mcount_thread_data *mtdp;
-       struct mcount_ret_stack *rstack;
 
        if (unlikely(real_pthread_exit == NULL))
                mcount_hook_functions();
 
        mtdp = get_thread_data();
-       if (!mcount_estimate_return && !check_thread_data(mtdp)) {
-               rstack = &mtdp->rstack[mtdp->idx - 1];
-               /* record the final call */
-               mcount_exit_filter_record(mtdp, rstack, NULL);
+       if (!check_thread_data(mtdp)) {
+               pr_dbg2("%s: exception resumed on [%d]\n", __func__, mtdp->idx);
+
+               mtdp->in_exception = true;
 
                /*
-                * it won't return to the caller ("noreturn"),
-                * do not try to restore the address..
+                * restore return addresses so that it can unwind stack
+                * frames safely during the exception handling.
+                * It pairs to mcount_rstack_reset_exception().
                 */
-               mtdp->idx--;
-
                mcount_rstack_restore(mtdp);
        }

@MichelleJin12 and I have been investigating the root cause further.

We are now checking the trampolined function pthread_exit() in libmcount/wrap.c, looking up struct mcount_thread_data *mtdp, struct mcount_ret_stack *rstack stack value.

This is result from present state.

[New Thread 0x7ffff61a6700 (LWP 1204)]
Time for 1 workers (wall time, total workers time): 11.677786 / 11.677139
[Thread 0x7ffff61a6700 (LWP 1204) exited]
[New Thread 0x7ffff61a6700 (LWP 1370)]
[New Thread 0x7ffff59a5700 (LWP 1371)]
[Thread 0x7ffff61a6700 (LWP 1370) exited]
Time for 2 workers (wall time, total workers time): 5.861623 / 11.682094
[Thread 0x7ffff59a5700 (LWP 1371) exited]
[New Thread 0x7ffff59a5700 (LWP 3861)]
[New Thread 0x7ffff61a6700 (LWP 3862)]
[New Thread 0x7ffff51a4700 (LWP 3863)]
[Thread 0x7ffff61a6700 (LWP 3862) exited]
[Thread 0x7ffff51a4700 (LWP 3863) exited]
Time for 3 workers (wall time, total workers time): 3.901969 / 11.685607
[Thread 0x7ffff59a5700 (LWP 3861) exited]
[New Thread 0x7ffff51a4700 (LWP 3912)]
[New Thread 0x7ffff61a6700 (LWP 3913)]
[New Thread 0x7ffff59a5700 (LWP 3914)]
[New Thread 0x7ffff49a3700 (LWP 3915)]
[Thread 0x7ffff61a6700 (LWP 3913) exited]
[Thread 0x7ffff59a5700 (LWP 3914) exited]
[Thread 0x7ffff51a4700 (LWP 3912) exited]
Time for 4 workers (wall time, total workers time): 2.943064 / 11.690259
[Thread 0x7ffff49a3700 (LWP 3915) exited]
[New Thread 0x7ffff49a3700 (LWP 4217)]
[New Thread 0x7ffff59a5700 (LWP 4218)]
[New Thread 0x7ffff61a6700 (LWP 4219)]
[New Thread 0x7ffff51a4700 (LWP 4220)]
[New Thread 0x7fffe7fff700 (LWP 4221)]
[Thread 0x7ffff59a5700 (LWP 4218) exited]
[Thread 0x7ffff49a3700 (LWP 4217) exited]
[Thread 0x7ffff61a6700 (LWP 4219) exited]
[Thread 0x7ffff51a4700 (LWP 4220) exited]
Time for 5 workers (wall time, total workers time): 2.378430 / 11.712168
[Thread 0x7fffe7fff700 (LWP 4221) exited]
[New Thread 0x7fffe7fff700 (LWP 4692)]
[New Thread 0x7ffff51a4700 (LWP 4693)]
[New Thread 0x7ffff61a6700 (LWP 4694)]
[New Thread 0x7ffff59a5700 (LWP 4695)]
[New Thread 0x7ffff49a3700 (LWP 4696)]
[New Thread 0x7fffe77fe700 (LWP 4697)]
[Thread 0x7fffe7fff700 (LWP 4692) exited]
[Thread 0x7fffe77fe700 (LWP 4697) exited]
[Thread 0x7ffff49a3700 (LWP 4696) exited]
[Thread 0x7ffff59a5700 (LWP 4695) exited]
[Thread 0x7ffff51a4700 (LWP 4693) exited]
Time for 6 workers (wall time, total workers time): 1.943285 / 11.592973
[Thread 0x7ffff61a6700 (LWP 4694) exited]
[New Thread 0x7fffe77fe700 (LWP 4767)]
[New Thread 0x7ffff49a3700 (LWP 4768)]
[New Thread 0x7ffff59a5700 (LWP 4769)]
[New Thread 0x7ffff61a6700 (LWP 4770)]
[New Thread 0x7ffff51a4700 (LWP 4771)]
[New Thread 0x7fffe7fff700 (LWP 4772)]
[New Thread 0x7fffe6ffd700 (LWP 4773)]
[Thread 0x7ffff59a5700 (LWP 4769) exited]
[Thread 0x7ffff49a3700 (LWP 4768) exited]
[Thread 0x7fffe6ffd700 (LWP 4773) exited]
[Thread 0x7fffe77fe700 (LWP 4767) exited]
[Thread 0x7fffe7fff700 (LWP 4772) exited]
[Thread 0x7ffff61a6700 (LWP 4770) exited]
Time for 7 workers (wall time, total workers time): 1.697712 / 11.813175
[Thread 0x7ffff51a4700 (LWP 4771) exited]
[New Thread 0x7fffe6ffd700 (LWP 4790)]
[New Thread 0x7fffe7fff700 (LWP 4791)]
[New Thread 0x7ffff51a4700 (LWP 4792)]
[New Thread 0x7ffff61a6700 (LWP 4793)]
[New Thread 0x7ffff59a5700 (LWP 4794)]
[New Thread 0x7ffff49a3700 (LWP 4802)]
[New Thread 0x7fffe77fe700 (LWP 4803)]
[New Thread 0x7fffe67fc700 (LWP 4804)]
[Thread 0x7ffff61a6700 (LWP 4793) exited]
[Thread 0x7fffe7fff700 (LWP 4791) exited]
[Thread 0x7fffe67fc700 (LWP 4804) exited]
[Thread 0x7fffe6ffd700 (LWP 4790) exited]
[Thread 0x7fffe77fe700 (LWP 4803) exited]
[Thread 0x7ffff51a4700 (LWP 4792) exited]
[Thread 0x7ffff49a3700 (LWP 4802) exited]
[Thread 0x7ffff59a5700 (LWP 4794) exited]
Time for 8 workers (wall time, total workers time): 1.503176 / 11.952770
[New Thread 0x7fffe67fc700 (LWP 4862)]
[New Thread 0x7fffe77fe700 (LWP 4864)]
[New Thread 0x7ffff49a3700 (LWP 4865)]
[New Thread 0x7ffff59a5700 (LWP 4866)]
[New Thread 0x7ffff61a6700 (LWP 4867)]
[New Thread 0x7ffff51a4700 (LWP 4868)]
[New Thread 0x7fffe7fff700 (LWP 4869)]
[New Thread 0x7fffe6ffd700 (LWP 4870)]
[New Thread 0x7fffe5ffb700 (LWP 4872)]
[Thread 0x7ffff49a3700 (LWP 4865) exited]
[Thread 0x7fffe67fc700 (LWP 4862) exited]
[Thread 0x7ffff51a4700 (LWP 4868) exited]
[Thread 0x7fffe5ffb700 (LWP 4872) exited]
[Thread 0x7ffff59a5700 (LWP 4866) exited]
[Thread 0x7fffe77fe700 (LWP 4864) exited]
[Thread 0x7fffe7fff700 (LWP 4869) exited]
[Thread 0x7ffff61a6700 (LWP 4867) exited]
[Thread 0x7fffe6ffd700 (LWP 4870) exited]
Time for 9 workers (wall time, total workers time): 1.342932 / 11.987668
[New Thread 0x7fffe5ffb700 (LWP 4885)]
[New Thread 0x7fffe6ffd700 (LWP 4886)]
[New Thread 0x7fffe7fff700 (LWP 4887)]
[New Thread 0x7ffff51a4700 (LWP 4889)]
[New Thread 0x7ffff61a6700 (LWP 4891)]
[New Thread 0x7ffff59a5700 (LWP 4892)]
[New Thread 0x7ffff49a3700 (LWP 4893)]
[New Thread 0x7fffe77fe700 (LWP 4894)]
[New Thread 0x7fffe67fc700 (LWP 4895)]
[New Thread 0x7fffe57fa700 (LWP 4896)]
[Thread 0x7fffe7fff700 (LWP 4887) exited]
[Thread 0x7fffe67fc700 (LWP 4895) exited]
[Thread 0x7fffe5ffb700 (LWP 4885) exited]
[Thread 0x7ffff59a5700 (LWP 4892) exited]
[Thread 0x7ffff61a6700 (LWP 4891) exited]
[Thread 0x7fffe57fa700 (LWP 4896) exited]
[Thread 0x7fffe6ffd700 (LWP 4886) exited]
[Thread 0x7ffff49a3700 (LWP 4893) exited]
[Thread 0x7ffff51a4700 (LWP 4889) exited]
Time for 10 workers (wall time, total workers time): 1.206425 / 11.922069
[Thread 0x7fffe77fe700 (LWP 4894) exited]
[New Thread 0x7fffe57fa700 (LWP 4908)]
[New Thread 0x7fffe67fc700 (LWP 4909)]
[New Thread 0x7fffe77fe700 (LWP 4910)]
[New Thread 0x7ffff49a3700 (LWP 4911)]
[New Thread 0x7ffff61a6700 (LWP 4912)]
[New Thread 0x7ffff59a5700 (LWP 4913)]
[New Thread 0x7ffff51a4700 (LWP 4914)]
[New Thread 0x7fffe7fff700 (LWP 4915)]
[New Thread 0x7fffe6ffd700 (LWP 4916)]
[New Thread 0x7fffe5ffb700 (LWP 4917)]
[New Thread 0x7fffe4ff9700 (LWP 4918)]
[Thread 0x7ffff61a6700 (LWP 4912) exited]
[Thread 0x7ffff51a4700 (LWP 4914) exited]
[Thread 0x7fffe7fff700 (LWP 4915) exited]
[Thread 0x7ffff59a5700 (LWP 4913) exited]
[Thread 0x7fffe5ffb700 (LWP 4917) exited]
[Thread 0x7fffe6ffd700 (LWP 4916) exited]
[Thread 0x7fffe77fe700 (LWP 4910) exited]
[Thread 0x7fffe57fa700 (LWP 4908) exited]
[Thread 0x7fffe4ff9700 (LWP 4918) exited]
[Thread 0x7ffff49a3700 (LWP 4911) exited]
[Thread 0x7fffe67fc700 (LWP 4909) exited]
Time for 11 workers (wall time, total workers time): 1.092783 / 11.943834
[New Thread 0x7fffe4ff9700 (LWP 4926)]
[New Thread 0x7fffe5ffb700 (LWP 4927)]
[New Thread 0x7fffe6ffd700 (LWP 4928)]
[New Thread 0x7fffe7fff700 (LWP 4929)]
[New Thread 0x7ffff61a6700 (LWP 4930)]
[New Thread 0x7ffff59a5700 (LWP 4931)]
[New Thread 0x7ffff5164700 (LWP 4932)]
[New Thread 0x7ffff4963700 (LWP 4933)]
[New Thread 0x7fffe77fe700 (LWP 4934)]
[New Thread 0x7fffe67fc700 (LWP 4935)]
[New Thread 0x7fffe57fa700 (LWP 4936)]
[New Thread 0x7fffc3fff700 (LWP 4938)]
[Thread 0x7fffe5ffb700 (LWP 4927) exited]
[Thread 0x7fffe6ffd700 (LWP 4928) exited]
[Thread 0x7fffe4ff9700 (LWP 4926) exited]
[Thread 0x7fffe7fff700 (LWP 4929) exited]
[Thread 0x7ffff5164700 (LWP 4932) exited]
[Thread 0x7fffe77fe700 (LWP 4934) exited]
[Thread 0x7fffc3fff700 (LWP 4938) exited]
[Thread 0x7fffe57fa700 (LWP 4936) exited]
[Thread 0x7ffff4963700 (LWP 4933) exited]
[Thread 0x7fffe67fc700 (LWP 4935) exited]
Time for 12 workers (wall time, total workers time): 1.034295 / 12.006508
[Thread 0x7ffff59a5700 (LWP 4931) exited]
[Thread 0x7ffff61a6700 (LWP 4930) exited]
[New Thread 0x7fffc3fff700 (LWP 4952)]
[New Thread 0x7fffe57fa700 (LWP 4953)]
[New Thread 0x7fffe67fc700 (LWP 4954)]
[New Thread 0x7fffe77fe700 (LWP 4956)]
[New Thread 0x7ffff61a6700 (LWP 4957)]
[New Thread 0x7ffff59a5700 (LWP 4958)]
[New Thread 0x7ffff51a4700 (LWP 4959)]
[New Thread 0x7ffff4963700 (LWP 4960)]
[New Thread 0x7fffe7fff700 (LWP 4961)]
[New Thread 0x7fffe6ffd700 (LWP 4962)]
[New Thread 0x7fffe5ffb700 (LWP 4963)]
[New Thread 0x7fffe4ff9700 (LWP 4964)]
[New Thread 0x7fffc37fe700 (LWP 4965)]
[Thread 0x7fffc3fff700 (LWP 4952) exited]
[Thread 0x7ffff51a4700 (LWP 4959) exited]
[Thread 0x7fffe6ffd700 (LWP 4962) exited]
[Thread 0x7fffe7fff700 (LWP 4961) exited]
[Thread 0x7fffe57fa700 (LWP 4953) exited]
[Thread 0x7fffe77fe700 (LWP 4956) exited]
[Thread 0x7fffe4ff9700 (LWP 4964) exited]
[Thread 0x7ffff59a5700 (LWP 4958) exited]
[Thread 0x7ffff61a6700 (LWP 4957) exited]
[Thread 0x7ffff4963700 (LWP 4960) exited]
[Thread 0x7fffe5ffb700 (LWP 4963) exited]
Time for 13 workers (wall time, total workers time): 0.948788 / 11.957615
[Thread 0x7fffc37fe700 (LWP 4965) exited]
[Thread 0x7fffe67fc700 (LWP 4954) exited]
[New Thread 0x7fffc37fe700 (LWP 5012)]
[New Thread 0x7fffe4ff9700 (LWP 5013)]
[New Thread 0x7fffe5ffb700 (LWP 5014)]
[New Thread 0x7fffe6ffd700 (LWP 5015)]
[New Thread 0x7ffff61a6700 (LWP 5016)]
[New Thread 0x7ffff59a5700 (LWP 5017)]
[New Thread 0x7ffff5184700 (LWP 5018)]
[New Thread 0x7ffff4963700 (LWP 5019)]
[New Thread 0x7fffe7fff700 (LWP 5020)]
[New Thread 0x7fffe77fe700 (LWP 5021)]
[New Thread 0x7fffe67fc700 (LWP 5022)]
[New Thread 0x7fffe57fa700 (LWP 5023)]
[New Thread 0x7fffc3fff700 (LWP 5024)]
[New Thread 0x7fffc2ffd700 (LWP 5025)]
[Thread 0x7fffc2ffd700 (LWP 5025) exited]
[Thread 0x7fffe6ffd700 (LWP 5015) exited]
[Thread 0x7fffe57fa700 (LWP 5023) exited]
[Thread 0x7fffc37fe700 (LWP 5012) exited]
[Thread 0x7fffe4ff9700 (LWP 5013) exited]
[Thread 0x7fffc3fff700 (LWP 5024) exited]
[Thread 0x7ffff5184700 (LWP 5018) exited]
[Thread 0x7ffff59a5700 (LWP 5017) exited]
[Thread 0x7fffe5ffb700 (LWP 5014) exited]
[Thread 0x7ffff4963700 (LWP 5019) exited]
[Thread 0x7fffe77fe700 (LWP 5021) exited]
[Thread 0x7fffe7fff700 (LWP 5020) exited]
[Thread 0x7ffff61a6700 (LWP 5016) exited]
Time for 14 workers (wall time, total workers time): 0.874361 / 11.954743
[Thread 0x7fffe67fc700 (LWP 5022) exited]
[New Thread 0x7fffc2ffd700 (LWP 5046)]
[New Thread 0x7fffc3fff700 (LWP 5047)]
[New Thread 0x7fffe57fa700 (LWP 5048)]
[New Thread 0x7fffe67fc700 (LWP 5049)]
[New Thread 0x7ffff61a6700 (LWP 5050)]
[New Thread 0x7ffff59a5700 (LWP 5051)]
[New Thread 0x7ffff5164700 (LWP 5052)]
[New Thread 0x7ffff4963700 (LWP 5053)]
[New Thread 0x7fffe7fff700 (LWP 5054)]
[New Thread 0x7fffe77fe700 (LWP 5055)]
[New Thread 0x7fffe6ffd700 (LWP 5056)]
[New Thread 0x7fffe5ffb700 (LWP 5058)]
[New Thread 0x7fffe4fd9700 (LWP 5059)]
[New Thread 0x7fffc37fe700 (LWP 5060)]
[New Thread 0x7fffc27fc700 (LWP 5061)]
[Thread 0x7fffc2ffd700 (LWP 5046) exited]
[Thread 0x7fffe77fe700 (LWP 5055) exited]
[Thread 0x7fffe6ffd700 (LWP 5056) exited]
[Thread 0x7fffe7fff700 (LWP 5054) exited]
[Thread 0x7fffe4fd9700 (LWP 5059) exited]
[Thread 0x7ffff61a6700 (LWP 5050) exited]
[Thread 0x7fffe5ffb700 (LWP 5058) exited]
[Thread 0x7ffff59a5700 (LWP 5051) exited]
[Thread 0x7ffff4963700 (LWP 5053) exited]
[Thread 0x7ffff5164700 (LWP 5052) exited]
[Thread 0x7fffe67fc700 (LWP 5049) exited]
[Thread 0x7fffc27fc700 (LWP 5061) exited]
[Thread 0x7fffc3fff700 (LWP 5047) exited]
[Thread 0x7fffe57fa700 (LWP 5048) exited]
[Thread 0x7fffc37fe700 (LWP 5060) exited]
Time for 15 workers (wall time, total workers time): 0.844295 / 12.015978
[New Thread 0x7fffc27fc700 (LWP 5071)]
[New Thread 0x7fffc37fe700 (LWP 5073)]
[New Thread 0x7fffe4fd9700 (LWP 5074)]
[New Thread 0x7fffe5ffb700 (LWP 5075)]
[New Thread 0x7ffff61a6700 (LWP 5078)]
[New Thread 0x7ffff59a5700 (LWP 5079)]
[New Thread 0x7ffff51a4700 (LWP 5080)]
[New Thread 0x7ffff4923700 (LWP 5081)]
[New Thread 0x7fffe7fff700 (LWP 5082)]
[New Thread 0x7fffe77fe700 (LWP 5083)]
[New Thread 0x7fffe6ffd700 (LWP 5084)]
[New Thread 0x7fffe67fc700 (LWP 5085)]
[New Thread 0x7fffe57da700 (LWP 5086)]
[New Thread 0x7fffc3fff700 (LWP 5087)]
[New Thread 0x7fffc2ffd700 (LWP 5088)]
[New Thread 0x7fffc1ffb700 (LWP 5089)]
[Thread 0x7fffc37fe700 (LWP 5073) exited]
[Thread 0x7fffe5ffb700 (LWP 5075) exited]
[Thread 0x7fffe4fd9700 (LWP 5074) exited]
[Thread 0x7fffe6ffd700 (LWP 5084) exited]
[Thread 0x7fffc27fc700 (LWP 5071) exited]
[Thread 0x7fffe7fff700 (LWP 5082) exited]
[Thread 0x7fffc2ffd700 (LWP 5088) exited]
[Thread 0x7fffe67fc700 (LWP 5085) exited]
[Thread 0x7fffe77fe700 (LWP 5083) exited]
[Thread 0x7fffc1ffb700 (LWP 5089) exited]
[Thread 0x7ffff61a6700 (LWP 5078) exited]
[Thread 0x7ffff59a5700 (LWP 5079) exited]
[Thread 0x7ffff4923700 (LWP 5081) exited]
[Thread 0x7ffff51a4700 (LWP 5080) exited]
[Thread 0x7fffc3fff700 (LWP 5087) exited]
[Thread 0x7fffe57da700 (LWP 5086) exited]
Time for 16 workers (wall time, total workers time): 0.765070 / 11.864572
Total work done (sum of squares from 100 to 5000000000): 470444716059854368
Total time taken: 40.0160 seconds
Cumulative time for all workers: 189.4592 seconds
[Inferior 2 (process 1164) exited normally]

@MichelleJin12
Copy link
Contributor

Before patch

Compiler                  gcc                                           clang                                       
Runtime test case         pg             finstrument-fu fpatchable-fun  pg             finstrument-fu fpatchable-fun
------------------------: O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os  O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os
047 signal2             : OK OK OK OK OK OK OK NG OK OK OK OK NG OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
124 exception           : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK
141 recv_basic          : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK NZ OK OK OK OK OK OK
182 thread_exit         : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK SG OK OK OK OK
183 info_quote          : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
184 arg_enum            : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
185 exception2          : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK
186 exception3          : OK OK OK OK OK OK OK OK OK OK NG NG NG NG NG  OK OK OK OK OK NG NG NG NG NG NG NG NG NG NG
204 arg_dwarf4          : OK OK NG NG OK SK SK SK SK SK OK OK NG NG OK  OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK
220 trace_script        : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG  NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG
226 default_opts        : OK OK OK OK NG NG OK OK OK OK OK NG NG OK OK  NG NG OK OK OK OK OK OK OK NG NG OK OK NG NG
251 exception4          : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK NZ NZ NZ NZ NZ OK OK OK OK OK
255 arg_dwarf6          : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK  NG NG NG NG NG SK SK SK SK SK NG NG NG NG NG
273 agent_basic         : NZ OK OK NZ OK NZ OK OK OK OK OK OK OK OK OK  OK OK OK OK NZ OK OK OK OK OK OK OK OK OK OK
281 agent_trace_toggle  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK NZ OK NZ OK NZ OK OK OK OK OK OK OK OK OK
282 agent_depth         : OK OK OK OK OK NZ OK NZ OK NZ OK NZ OK OK OK  OK OK OK OK OK OK OK OK OK NZ OK OK OK OK OK
283 agent_time          : NG NG NZ NG NG NG NG NG NG NG OK NG NG NG NG  OK NG NG NG OK NG NG NG NG NG OK OK NG OK NG
284 agent_filter        : OK OK NZ OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
285 agent_caller_filter : OK NZ OK OK OK OK OK NZ OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
286 agent_trigger       : NG NG NG NG NG NZ OK NG NG NG NG NG OK NG NG  OK OK OK OK NG NG OK NG NG OK OK OK NG NG NG
287 arg_enum3           : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK  OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK
288 arg_oct             : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
289 exception5          : OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK  OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK

runtime test stats
====================
total  8700  Tests executed (success: 82.51%)
  OK:  7048  Test succeeded
  OK:   130  Test succeeded (with some fixup)
  NG:   133  Different test result
  NZ:    23  Non-zero return value
  SG:     1  Abnormal exit by signal
  TM:     0  Test ran too long
  BI:    60  Build failed
  LA:     0  Unsupported Language
  SK:  1305  Skipped

After patch2

045 report_avg_self     : OK OK OK OK OK OK OK OK OK OK OK OK NG OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
047 signal2             : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK NG OK OK OK OK OK OK OK OK OK OK NG
110 replay_time_T       : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK NG OK OK OK OK OK OK OK OK OK OK OK OK OK
124 exception           : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK
141 recv_basic          : OK OK OK OK OK OK OK OK OK OK OK OK NZ OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
182 thread_exit         : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG  NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG
185 exception2          : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK
186 exception3          : OK OK OK OK OK OK OK OK OK OK NG NG NG NG NG  OK OK OK OK OK NG NG NG NG NG NG NG NG NG NG
204 arg_dwarf4          : OK OK NG NG OK SK SK SK SK SK OK OK NG NG OK  OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK
220 trace_script        : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG  NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG
226 default_opts        : OK OK OK OK NG NG OK OK OK OK NG OK OK OK OK  OK OK NG NG OK OK OK OK NG NG OK OK OK OK OK
238 report_field2       : OK OK OK OK OK OK OK OK OK NG OK OK OK NG OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
251 exception4          : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK NZ NZ NZ NZ NZ OK OK OK OK OK
255 arg_dwarf6          : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK  NG NG NG NG NG SK SK SK SK SK NG NG NG NG NG
273 agent_basic         : OK OK OK OK OK OK OK OK OK NZ OK NZ OK NZ OK  NZ OK OK OK NZ OK OK NZ OK OK OK OK OK OK OK
281 agent_trace_toggle  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK NZ  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
282 agent_depth         : OK OK OK OK OK OK OK OK OK OK OK NZ OK OK OK  NZ OK OK OK OK OK OK OK OK OK OK OK OK OK OK
283 agent_time          : NG NZ OK OK OK NG NG NG NG NG NG NG NG OK NZ  NG NG NG NG NG NG NG OK NG NG NG NG NG NG NG
285 agent_caller_filter : OK OK OK OK OK OK OK OK OK OK OK NZ OK OK NZ  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
286 agent_trigger       : OK NG NG OK OK NG OK NG NG OK NG OK NG NG OK  NG OK OK OK NG NG NG OK OK OK OK OK NG NG NG
289 exception5          : OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK  OK OK OK OK OK NG NG NG NG NG OK OK OK OK OK

runtime test stats
====================
total  8700  Tests executed (success: 82.24%)
  OK:  7025  Test succeeded
  OK:   130  Test succeeded (with some fixup)
  NG:   160  Different test result
  NZ:    20  Non-zero return value
  SG:     0  Abnormal exit by signal
  TM:     0  Test ran too long
  BI:    60  Build failed
  LA:     0  Unsupported Language
  SK:  1305  Skipped

182 thread_exit : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG

All fails.. We need more investigation.

@MichelleJin12
Copy link
Contributor

MichelleJin12 commented Jul 21, 2024

If remove pthread_exit, can't trace pthread_exit.

$ cd tests
$ gcc -pg s-thread-exit.c -pthread
$ uftrace a.out

Answer

# DURATION    TID     FUNCTION
            [26832] | main() {
            [26832] |   pthread_create() {
  51.697 us [26832] |   } /* pthread_create */
            [26832] |   pthread_create() {
  32.395 us [26832] |   } /* pthread_create */
            [26832] |   pthread_join() {
            [26836] | thread_main() {
            [26836] |   printf() {
  17.092 us [26836] |   } /* printf */
            [26836] |   pthread_exit() {
            [26837] | thread_main() {
            [26837] |   printf() {
   5.480 us [26837] |   } /* printf */
            [26837] |   pthread_exit() {
 362.442 us [26832] |   } /* pthread_join */
            [26832] |   pthread_join() {
   1.000 us [26832] |   } /* pthread_join */
 457.662 us [26832] | } /* main */

After patch2 result

$ uftrace ./a.out 
1.000000
1.000000
# DURATION     TID      FUNCTION
   0.597 us [ 241052] | __monstartup();
   0.224 us [ 241052] | __cxa_atexit();
            [ 241052] | main() {
  61.820 us [ 241052] |   pthread_create();
  62.611 us [ 241052] |   pthread_create();
            [ 241052] |   pthread_join() {
            [ 241055] | thread_main() {
  67.348 us [ 241055] |   printf();
            [ 241056] | thread_main() {
  36.510 us [ 241056] |   printf();
   1.011 ms [ 241052] |   } /* pthread_join */
   0.268 us [ 241052] |   pthread_join();
   1.138 ms [ 241052] | } /* main */

uftrace stopped tracing with remaining functions
================================================
task: 241055
[0] thread_main

task: 241056
[0] thread_main

@paranlee
Copy link
Contributor

paranlee commented Jul 21, 2024

This issue appears to be an x86 only issue at this time.

Below is the result without patched on arm64 laptop.

$ uname -a
Linux fedora 6.10.0-0.rc7.58.fc41.aarch64 #1 SMP PREEMPT_DYNAMIC Tue Jul  9 05:11:42 KST 2024 aarch64 GNU/Linux

$ uftrace --version
uftrace v0.16-11-g804a ( aarch64 dwarf python3 luajit tui perf sched dynamic kernel )

$ uftrace record sum_of_squares-O0-pg 
Time for 1 workers (wall time, total workers time): 3.779770 / 3.779519
Time for 2 workers (wall time, total workers time): 1.955173 / 3.907919
Time for 3 workers (wall time, total workers time): 1.342060 / 4.022616
Time for 4 workers (wall time, total workers time): 1.210077 / 4.776442
Time for 5 workers (wall time, total workers time): 1.235631 / 6.080350
Time for 6 workers (wall time, total workers time): 1.363751 / 6.876253
Time for 7 workers (wall time, total workers time): 1.272646 / 7.838870
Time for 8 workers (wall time, total workers time): 1.256382 / 9.250987
Time for 9 workers (wall time, total workers time): 1.282801 / 10.685252
Time for 10 workers (wall time, total workers time): 1.296283 / 12.057002
Time for 11 workers (wall time, total workers time): 1.272074 / 12.737434
Time for 12 workers (wall time, total workers time): 1.270198 / 14.185802
Time for 13 workers (wall time, total workers time): 1.280878 / 15.667125
Time for 14 workers (wall time, total workers time): 1.296344 / 16.875531
Time for 15 workers (wall time, total workers time): 1.293961 / 18.002395
Time for 16 workers (wall time, total workers time): 1.284461 / 19.564768
Total work done (sum of squares from 100 to 5000000000): 470444716059854368
Total time taken: 23.6925 seconds
Cumulative time for all workers: 166.3083 seconds
...
   0.333 us [   5198] |   gettimeofday();
            [   5198] |   pthread_exit() {
            [   5198] |     /* linux:task-exit */
   0.292 us [   5197] |   gettimeofday();
            [   5197] |   pthread_exit() {
            [   5197] |     /* linux:task-exit */
   1.955  s [   5194] |       /* linux:sched-in */
   1.955  s [   5194] |     } /* pthread_join */
   0.208 us [   5194] |     pthread_join();
   0.125 us [   5194] |     gettimeofday();
   1.955  s [   5194] |   }
...

@namhyung
Copy link
Owner

I wonder if there was a change about the exit path in libpthread recently.

@paranlee
Copy link
Contributor

paranlee commented Jul 24, 2024

Thank you @namhyung, based on c7e105, I'll check libpthread in older versions of libc to see if the pthread_exit function's implementation has changed. I've checked the issue in the libpthread 2.17 version on RHEL 7.

$ ldd ./sum_of_squares-O2-pg
	linux-vdso.so.1 =>  (0x00007ffe72efe000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f279896c000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f279859e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f2798b88000)
$ ls -ahl /lib64/libpthread.so.0
lrwxrwxrwx. 1 root root 18 Jun 30  2022 /lib64/libpthread.so.0 -> libpthread-2.17.so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants