Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pyroscope libraries resulting in segmentation fault #20

Closed
sentient-glare opened this issue Feb 21, 2023 · 5 comments
Closed

Pyroscope libraries resulting in segmentation fault #20

sentient-glare opened this issue Feb 21, 2023 · 5 comments
Assignees

Comments

@sentient-glare
Copy link

sentient-glare commented Feb 21, 2023

Following the instructions for the dotnet beta integration. I am trying to profile an ASPNET application with Pyroscope.

Dockerfile below - this is largely copied from a Microsoft example. When I build and run this, the container exits with a 139 code (seg fault).

Things I've tried:

  • Downgrading Pyroscope libraries from 0.7.2 to 0.7.0
  • Using Debian and Ubuntu images instead of Alpine
  • Messing around with the libc6-compat library instead of gcompat
FROM mcr.microsoft.com/dotnet/sdk:6.0-alpine3.16 AS build
WORKDIR /sln

# Copy project file and restore
COPY "TestApp.Api.csproj" "."
RUN dotnet restore TestApp.Api.csproj

# Copy the actual source code
COPY "." "."

# Build and publish the app
RUN dotnet publish "TestApp.Api.csproj" -c Release -o /app/publish --self-contained false --no-restore

#FINAL image
FROM mcr.microsoft.com/dotnet/aspnet:6.0-alpine3.16
WORKDIR /app
COPY --from=build /app/publish .

# Set up Pyroscope profiling
RUN apk update && apk add gcompat
RUN wget -P /lib/ https://github.com/pyroscope-io/pyroscope-dotnet/releases/download/v0.7.0/Pyroscope.Linux.ApiWrapper.x64.so
RUN wget -P /lib/ https://github.com/pyroscope-io/pyroscope-dotnet/releases/download/v0.7.0/Pyroscope.Profiler.Native.so
ENV PYROSCOPE_APPLICATION_NAME=test.dotnet.app
ENV PYROSCOPE_SERVER_ADDRESS=http://pyroscope-route:80
ENV PYROSCOPE_PROFILING_ENABLED=1
ENV CORECLR_ENABLE_PROFILING=1
ENV CORECLR_PROFILER={BD1A650D-AC5D-4896-B64F-D6FA25D6B26A}
ENV CORECLR_PROFILER_PATH=/lib/Pyroscope.Profiler.Native.so
ENV LD_PRELOAD=/lib/Pyroscope.Linux.ApiWrapper.x64.so

ENTRYPOINT ["dotnet", "TestApp.Api.dll"]
@korniltsev
Copy link
Collaborator

I believe we will need to build separate binaries for alpine/musl

@korniltsev korniltsev self-assigned this Feb 22, 2023
@sentient-glare
Copy link
Author

Gotcha - using Alpine isn't a requirement for me but I got similar results with Debian and Ubuntu

@korniltsev
Copy link
Collaborator

oops. I will look into it.

@korniltsev
Copy link
Collaborator

@sentient-glare I tried to reproduce the crash on glibc based base image and could not reproduce

Here is what I tried:
https://github.com/pyroscope-io/pyroscope/blob/fix/crash_dotnet_example/examples/dotnet/web-new/Dockerfile

I wonder what is the difference with your setup on debian base image.
Is there any chance you could share a stacktrace of segmentation fault? or even better a coredump? or even better full self-contained example crashing(not alpine)?

Btw, what is your host OS and architecture?

@sentient-glare
Copy link
Author

Sorry for the wait on this. I tried out a Dockerfile similar to yours and it worked fine for me, I think I didn't track my local testing all that well and it was using Alpine images that was the issue all along and/or was doing some funky stuff with the libraries 😢 the debian images all work for me now. thanks for the help!

korniltsev pushed a commit that referenced this issue Oct 23, 2024
…#5808)

## Summary of changes

Prevent deadlock betwen signal-based profilers (walltime/manual cpu
profilers) and non-signal based profilers (exception, contention....)

## Reason for change

When an exception occurs, the thread can be interrupted by a
signal-based profiler (walltime/manual cpu). It can be interrupted while
holding the lock used to update the `dl-iterate-phdr` cache.

```
Thread 18 (LWP 995):
#0  __syscall_cp_c (nr=202, u=140244538814536, v=128, w=-1, x=0, y=0, z=0) at ./arch/x86_64/syscall_arch.h:61
#1  0x00007f8dba343ccd in __futex4_cp (to=0x0, val=-1, op=128, addr=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at src/thread/__timedwait.c:24
#2  __timedwait_cp (addr=addr@entry=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>, val=val@entry=-1, clk=clk@entry=0, at=at@entry=0x0, priv=priv@entry=128) at src/thread/__timedwait.c:52
#3  0x00007f8dba343d74 in __timedwait (addr=addr@entry=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>, val=-1, clk=clk@entry=0, at=at@entry=0x0, priv=128) at src/thread/__timedwait.c:68
#4  0x00007f8dba3463e6 in __pthread_rwlock_timedrdlock (at=<optimized out>, rw=<optimized out>) at src/thread/pthread_rwlock_timedrdlock.c:18
#5  __pthread_rwlock_timedrdlock (rw=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>, at=0x0) at src/thread/pthread_rwlock_timedrdlock.c:3
#6  0x00007f8d398f3ca8 in std::__glibcxx_rwlock_rdlock (__rwlock=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:73
#7  std::__shared_mutex_pthread::lock_shared (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:224
#8  std::shared_mutex::lock_shared (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:421
#9  std::shared_lock<std::shared_mutex>::shared_lock (this=0x7f4ca05a2ac0, __m=...) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:722
#10 LibrariesInfoCache::DlIteratePhdrImpl (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>, callback=0x7f8d3997d900 <_Ux86_64_dwarf_callback>, data=0x7f4ca05a2b20) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LibrariesInfoCache.cpp:104
#11 0x00007f8d3997e4ee in _Ux86_64_dwarf_find_proc_info (as=0x7f8d39eb2a00 <local_addr_space>, ip=140246691112115, pi=0x7f4ca05a3170, need_unwind_info=1, arg=0x7f4ca05a3411) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gfind_proc_info-lsb.c:807
#12 0x00007f8d3997e690 in fetch_proc_info (c=0x7f4ca05a3018, ip=140246691112115) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gparser.c:473
#13 0x00007f8d3998113d in find_reg_state (sr=0x7f4ca05a2dc0, c=0x7f4ca05a3018) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gparser.c:1024
#14 _Ux86_64_dwarf_step (c=c@entry=0x7f4ca05a3018) at /project/obj/libunwind-prefix/src/libunwind/src/dwarf/Gparser.c:1069
#15 0x00007f8d3997d13a in _Ux86_64_step (cursor=0x7f4ca05a3018) at /project/obj/libunwind-prefix/src/libunwind/src/x86_64/Gstep.c:75
#16 0x00007f8d398f55c8 in LinuxStackFramesCollector::CollectStackManually (this=this@entry=0x7f8d392dc6d0, ctx=ctx@entry=0x7f4ca05a3880) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LinuxStackFramesCollector.cpp:288
#17 0x00007f8d398f53dc in LinuxStackFramesCollector::CollectCallStackCurrentThread (this=this@entry=0x7f8d392dc6d0, ctx=ctx@entry=0x7f4ca05a3880) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LinuxStackFramesCollector.cpp:227
#18 0x00007f8d398f4672 in LinuxStackFramesCollector::CollectStackSampleSignalHandler (signal=<optimized out>, info=<optimized out>, context=0x7f4ca05a3880) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LinuxStackFramesCollector.cpp:373
#19 0x00007f8d398fb871 in ProfilerSignalManager::CallCustomHandler (this=0x7f8d39eaf928 <ProfilerSignalManager::Get(int)::signalManagers+1944>, signal=10, info=0x7f4ca05a39b0, context=0x7f4ca05a3880) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/ProfilerSignalManager.cpp:197
#20 ProfilerSignalManager::SignalHandler (signal=10, info=0x7f4ca05a39b0, context=0x7f4ca05a3880) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/ProfilerSignalManager.cpp:188
#21 <signal handler called>
#22 __pthread_rwlock_unlock (rw=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at src/thread/pthread_rwlock_unlock.c:5
#23 0x00007f8d398f3bf9 in std::__glibcxx_rwlock_unlock (__rwlock=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:77
#24 std::__shared_mutex_pthread::unlock (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:208
#25 std::shared_mutex::unlock (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/shared_mutex:417
#26 std::unique_lock<std::shared_mutex>::unlock (this=0x7f4ca05a3e20) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/bits/unique_lock.h:194
#27 std::unique_lock<std::shared_mutex>::~unique_lock (this=0x7f4ca05a3e20) at /usr/lib/gcc/x86_64-alpine-linux-musl/10.3.1/../../../../include/c++/10.3.1/bits/unique_lock.h:103
#28 LibrariesInfoCache::UpdateCache (this=0x7f8d39eaf048 <LibrariesInfoCache::Get()::Instance>) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LibrariesInfoCache.cpp:88
#29 0x00007f8d398f4e59 in LinuxStackFramesCollector::CollectStackSampleImplementation (this=0x7f8d3b91bc90, pThreadInfo=0x7f4ca06b9900, pHR=0x7f8d3a63c510, selfCollect=true) at /p--Type <RET> for more, q to quit, c to continue without paging--
roject/profiler/src/ProfilerEngine/Datadog.Profiler.Native.Linux/LinuxStackFramesCollector.cpp:100
#30 0x00007f8d399637ba in StackFramesCollectorBase::CollectStackSample (this=0x7f8d3b91bc90, pThreadInfo=0x7f4ca06b9900, pHR=0x7f4ca05a3fdc) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native/StackFramesCollectorBase.cpp:185
#31 0x00007f8d3992acb9 in ExceptionsProvider::OnExceptionThrown (this=0x7f8d392a7160, thrownObjectId=139969739182080) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native/ExceptionsProvider.cpp:149
#32 0x00007f8d39917045 in CorProfilerCallback::ExceptionThrown (this=0x7f8d392c0d20, thrownObjectId=139969739182080) at /project/profiler/src/ProfilerEngine/Datadog.Profiler.Native/CorProfilerCallback.cpp:1734
```
## Implementation details

- move the call which updates the cache after acquiring the thread lock
- call Update before sending signal

## Test coverage

## Other details
<!-- Fixes #{issue} -->

<!-- ⚠️ Note: where possible, please obtain 2 approvals prior to
merging. Unless CODEOWNERS specifies otherwise, for external teams it is
typically best to have one review from a team member, and one review
from apm-dotnet. Trivial changes do not require 2 reviews. -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants