Skip to content
This repository has been archived by the owner on Nov 1, 2020. It is now read-only.

Linux stack walking performance #3784

Open
jkotas opened this issue Jun 2, 2017 · 13 comments
Open

Linux stack walking performance #3784

jkotas opened this issue Jun 2, 2017 · 13 comments

Comments

@jkotas
Copy link
Member

jkotas commented Jun 2, 2017

We spend a lot of time in libuwind during stackwalking on Unix:

  • Conversions between CoreRT CONTEXT report and libuwind context record
  • LLVM libuwind itself is slow because of its abstraction layers

We should look into replacing LLVM libuwind with our own DWARF unwinder that just does what we need (ie may not need to support all DWARD codes), without the unnecessary overhead.

Mono has a prior art on this that may be useful: https://github.com/mono/mono/blob/1debf3934120547b3003c0ec4ec90bae4b08ee13/mono/mini/unwind.c#L515

@am11
Copy link
Member

am11 commented Jun 10, 2017

@jkotas, @janvorli, are there future plans for moving CoreCLR to also consume a light-weight cross platform unwinder, in effort to waive the libunwid dependency?

I found a related issue dotnet/coreclr#872, but as far as I can tell, LLVM libunwind is already supported in cmake script as a (degraded) fallback to the primary HP libunwind. I am not quite sure, but both of these libraries might be too heavy for CoreCLR use-cases as well.

@jkotas
Copy link
Member Author

jkotas commented Jun 11, 2017

We do not have plans like that currently.

CoreCLR has its own more lightweight unwinder already. CoreCLR is using libuwind to unwind from manually managed code only. It is rare case, so it does not matter much that it is relatively slow. Unwinding from manually managed code needs to support all unwind codes that may be potentially generated by the C/C++ compiler so limiting it to a subset of unwind codes to make it more lightweight is not very viable.

@am11
Copy link
Member

am11 commented Apr 3, 2020

CoreCLR is using libuwind to unwind from manually managed code only.

@jkotas, thanks for your explanation. :)

Now that mono and coreclr co-exist in the same repository, is it viable to use the same mechanism which mono uses for manually managed code, as a fallback to libunwind; for platforms which do not have libuwind readily available (Solaris, QNX and so on)?

I tried to port parts of libunwind to Solaris last year which is now available in very recent release candidate 1.5-rc1. However, after the successful compilation, it cashes a lot during the tests due to some fundamental differences. Therefore, some work still needs to be done by somebody who is more fluent with stack unwinding, as i am not familiar with uwcontext. Or if we can borrow some fallback implementation from mono, that would be equally helpful for porting coreclr to new platforms (with as good/bad unwinding support as mono, which is probably acceptable than not having it at all?)

@am11
Copy link
Member

am11 commented Apr 3, 2020

Another idea was to implement unwinding in C# using something like https://github.com/konrad-kruczynski/elfsharp. :)

@jkotas
Copy link
Member Author

jkotas commented Apr 3, 2020

CoreCLR is sensitive to having properly behaving stack unwinder for manually managed code. I doubt that switching from one poorly debugged libuwind implementation to a different poorly debugged libunwind implementation would actually fix the crashes that you are seeing.

The ultimate fix would be to get rid of coreclr dependency on libuwind by getting rid of FCALLs with HELPER_METHOD_FRAME, but that is a lot of work.

@janvorli
Copy link
Member

janvorli commented Apr 3, 2020

The ultimate fix would be to get rid of coreclr dependency on libuwind by getting rid of FCALLs with HELPER_METHOD_FRAME, but that is a lot of work.

That would not be sufficient to get rid of the dependency on libunwind. We also use libunwind for the first pass of managed EH to walk through runtime native code that's in-between. We need to do that to find possible native handlers of the exception. Only the 2nd pass actually let's c++ EH to unwind the native frames. See
https://github.com/dotnet/runtime/blob/66ded9cbe4126f401cf42022e99910a957b4ba7b/src/coreclr/src/vm/exceptionhandling.cpp#L4745-L4762

@am11
Copy link
Member

am11 commented Jun 24, 2020

CoreCLR is sensitive to having properly behaving stack unwinder for manually managed code.

Is there a way to sanity test this somehow from C#? i.e. intentionally create such a situation in C# which causes exception on CLR native stack (in manually managed code), in order to assess the tangibility of unwinder.

@jkotas
Copy link
Member Author

jkotas commented Jun 24, 2020

All places that call FCThrow will hit the unmanaged unwinder and most place places that call COMPlusThrow will hit it too.

Running all libraries tests is probably the best way to exercise sufficient number of these.

@mjsabby
Copy link
Contributor

mjsabby commented Jun 25, 2020

@jkotas how much is a lot of work? weeks? months? maybe years? It would be awesome to not have to care about things like libunwind.

Maybe it's worth a treatise to find what would it take to get rid of each, even if that is farmed out to the community.

@jkotas
Copy link
Member Author

jkotas commented Jun 25, 2020

months for sure

Here is an example what it takes to convert one FCall with HMF: dotnet/runtime#1929. There are 400+ of these. If each of them is 50 lines delta, this would be ~20,000 delta total.

We would also need to do something about the special uses of the unwinder like the one @janvorli mentioned, but that's probably cheaper problem than converting all FCalls.

@am11
Copy link
Member

am11 commented Jun 25, 2020

Thank you. It was easier to spot at least one weakness in locating native frame in pass1 on SmartOS: dotnet/runtime#38373.

Sounds like it is a general goodness to convert FCalls to QCalls. There are total 503 occurrences of [MethodImpl(MethodImplOptions.InternalCall)] (with and without HELPER_METHOD_FRAME).

$ git grep '\[MethodImpl(MethodImplOptions.InternalCall)\]' :/src/coreclr/*.cs | wc -l
503

Folks are also porting/fixing libunwind for QNX OS and MIPS arch for CoreCLR. If we instead combine the effort to get rid of libunwind in few months, I think it will eventually improve the overall portability of runtime.

ps: like CoreRT, rust-lang also uses llvm-libunwind for some targets, but it is optional. By default it has minimal implementation written in Rust to cater its exception handling needs. Maybe we can also implement a minimal unwinder in C# for CoreRT, using elfsharp etc.

@janvorli
Copy link
Member

It was easier to spot at least one weakness in locating native frame in pass1 on SmartOS: dotnet/runtime#38373.

That behavior is correct and equivalent to Linux

@janvorli
Copy link
Member

All places that call FCThrow will hit the unmanaged unwinder and most place places that call COMPlusThrow will hit it too.

The other important place, as I've mentioned before, are the cases when managed exception handling needs to unwind through native frames that are in-between managed frames. One example of such scenario is throwing a exception from a method that was called via reflection and catching it at the caller site.

The call stack you get at the throw is below. You can see that there is a managed frame reflectioninvoke.Program.Test, then there are four native frames and then managed frames again. The unwinding will start at reflectioninvoke.Program.Test and the Windows style managed unwinder will unwind to the libcoreclr.so!CallDescrWorkerInternal. Then we will start unwinding using the libunwind unwinder and checking NativeExceptionHolders in the unwound range. Those holders represent places in native code where we can catch the PAL_SEHException we are using internally to propagate managed exceptions. The holders have InvokeFilter method that is called to decide whether the exception will be handled by that native frame or not. If that returns true, we have found the handler and switch to the 2nd pass of exception handling that doesn't use our libunwind (it uses standard c++ exception handling for native frames). We start again from the reflectioninvoke.Program.Test in a similar manner, but once we reach the libcoreclr.so!CallDescrWorkerInternal, we throw the native PAL_SEHException from that context using PAL_ThrowExceptionFromContext. The standard c++ exception kicks in and the exception is caught by the C++ catch at RuntimeMethodHandle::InvokeMethod (that catch is generated by the HELPER_METHOD_FRAME_BEGIN_RET_PROTECT macro). Then we "rethrow" the managed exception from the System.RuntimeMethodHandle.InvokeMethod frame (we just call DispatchManagedException with context set to that frame).

        Child SP               IP Call Site
00007FFFFFFFCED8 00007FFFF6179740 libcoreclr.so!UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*)
00007FFFFFFFCEE0 00007FFFF6179B96 libcoreclr.so!DispatchManagedException(PAL_SEHException&, bool) + 134
00007FFFFFFFD3E0 00007FFFF60F65F8 libcoreclr.so!IL_Throw(Object*) + 600
00007FFFFFFFD3F0                  [HelperMethodFrame: 00007fffffffd3f0]
00007FFFFFFFD560 00007FFF7CD93B3D exceptionfrominvoke.dll!reflectioninvoke.Program.Test(Int32) + 77 [/home/janvorli/test/exceptionfrominvoke/Program.cs @ 30]
00007FFFFFFFD580 00007FFFF6186FBF libcoreclr.so!CallDescrWorkerInternal + 124
00007FFFFFFFD5A0 00007FFFF60B28D5 libcoreclr.so!CallDescrWorkerWithHandler(CallDescrData*, int) + 117
00007FFFFFFFD5E0 00007FFFF6126E33 libcoreclr.so!CallDescrWorkerReflectionWrapper(CallDescrData*, Frame*) + 131
00007FFFFFFFD660 00007FFFF6127E2B libcoreclr.so!RuntimeMethodHandle::InvokeMethod(Object*, PtrArray*, SignatureNative*, bool, bool) + 3307
00007FFFFFFFD7F0                  [DebuggerU2MCatchHandlerFrame: 00007fffffffd7f0]
00007FFFFFFFD8E8                  [HelperMethodFrame_PROTECTOBJ: 00007fffffffd8e8] System.RuntimeMethodHandle.InvokeMethod(System.Object, System.Object[], System.Signature, Boolean, Boolean)
00007FFFFFFFDA60 00007FFF7C96F159 System.Private.CoreLib.dll!System.Reflection.RuntimeMethodInfo.Invoke(System.Object, System.Reflection.BindingFlags, System.Reflection.Binder, System.Object[], System.Globalization.CultureInfo) + 153
00007FFFFFFFDAA0 00007FFF7C969928 System.Private.CoreLib.dll!System.Reflection.MethodBase.Invoke(System.Object, System.Object[]) + 24
00007FFFFFFFDAB0 00007FFF7CD90AD9 exceptionfrominvoke.dll!reflectioninvoke.Program.Main(System.String[]) + 345 [/home/janvorli/test/exceptionfrominvoke/Program.cs @ 37]
00007FFFFFFFDB50 00007FFFF6186FBF libcoreclr.so!CallDescrWorkerInternal + 124
00007FFFFFFFDB70 00007FFFF60B336B libcoreclr.so!MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 1643
00007FFFFFFFDD50 00007FFFF619CD9A libcoreclr.so!RunMain(MethodDesc*, short, int*, PtrArray**) + 826
00007FFFFFFFDF50 00007FFFF619D0D9 libcoreclr.so!Assembly::ExecuteMainMethod(PtrArray**, int) + 393
00007FFFFFFFE250 00007FFFF5FE72D3 libcoreclr.so!CorHost2::ExecuteAssembly(unsigned int, char16_t const*, int, char16_t const**, unsigned int*) + 627
00007FFFFFFFE330 00007FFFF5FCB75D libcoreclr.so!coreclr_execute_assembly + 413
00007FFFFFFFE3A0 00007FFFF68AE71A libhostpolicy.so!run_app_for_context(hostpolicy_context_t const&, int, char const**) + 826
00007FFFFFFFE460 00007FFFF68AEB81 libhostpolicy.so!run_app(int, char const**) + 49
00007FFFFFFFE490 00007FFFF68AF2F4 libhostpolicy.so!corehost_main + 212
00007FFFFFFFE570 00007FFFF6AFFBBE libhostfxr.so!fx_muxer_t::handle_exec_host_command(std::string const&, host_startup_info_t const&, std::string const&, std::unordered_map<known_options, std::vector<std::string, std::allocator<std::string> >, known_options_hash, std::equal_to<known_options>, std::allocator<std::pair<known_options const, std::vector<std::string, std::allocator<std::string> > > > > const&, int, char const**, int, host_mode_t, char*, int, int*) + 1774
00007FFFFFFFE6A0 00007FFFF6AFE303 libhostfxr.so!fx_muxer_t::execute(std::string, int, char const**, host_startup_info_t const&, char*, int, int*) + 643
00007FFFFFFFE780 00007FFFF6AFAC24 libhostfxr.so!hostfxr_main_startupinfo + 148
00007FFFFFFFE800 0000555555558C12 dotnet!exe_start(int, char const**) + 770
00007FFFFFFFE8A0 00005555555593E0 dotnet!main + 144
00007FFFFFFFE8E0 00007FFFF6D6B830 libc.so.6!__libc_start_main + 240 at /build/glibc-LK5gWL/glibc-2.23/csu/libc-start.c:325
00007FFFFFFFE9A0 0000555555557A0A dotnet!_start + 41

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants