Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rhel8 arm64 throws NullReferenceExceptions #43349

Closed
tmds opened this issue Oct 13, 2020 · 61 comments
Closed

rhel8 arm64 throws NullReferenceExceptions #43349

tmds opened this issue Oct 13, 2020 · 61 comments
Labels
arch-arm64 area-PAL-coreclr tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Milestone

Comments

@tmds
Copy link
Member

tmds commented Oct 13, 2020

In our CI builds, each run on RHEL8 arm64 shows NullReferenceExceptions in the log.

On the same arm64 host with a Fedora 32 VM there are no NullReferenceExceptions.
When I build and test on another RHEL8 arm64 machine, NullReferenceExceptions also show up in unexpected places.

Some example stack traces from CI log:

Microsoft.Extensions.Hosting tests

  �[m�[31;1m�[m�[37m      System.NullReferenceException : Object reference not set to an instance of an object.
  �[m�[30;1m      Stack Trace:
  �[m�[37m        /home/tester/runtime/src/coreclr/src/System.Private.CoreLib/src/System/Array.CoreCLR.cs(521,0): at System.SZArrayHelper.GetEnumerator[T]()
  �[m�[37m        /home/tester/runtime/src/libraries/System.Linq/src/System/Linq/Single.cs(136,0): at System.Linq.Enumerable.SingleOrDefault[TSource](IEnumerable`1 source, Func`2 predicate)
  �[m�[37m           at System.Reflection.NetCoreReflectionExtensions.GetConstructor(Type type, BindingFlags bindingAttr, Object binder, Type[] types, Object[] modifiers)
  �[m�[37m           at Castle.DynamicProxy.Generators.InterfaceProxyWithTargetGenerator.EnsureValidBaseType(Type type)
  �[m�[37m           at Castle.DynamicProxy.Generators.InterfaceProxyWithTargetGenerator.GenerateCode(Type proxyTargetType, Type[] interfaces, ProxyGenerationOptions options)
  �[m�[37m           at Castle.DynamicProxy.DefaultProxyBuilder.CreateInterfaceProxyTypeWithoutTarget(Type interfaceToProxy, Type[] additionalInterfacesToProxy, ProxyGenerationOptions options)
  �[m�[37m           at Castle.DynamicProxy.ProxyGenerator.CreateInterfaceProxyTypeWithoutTarget(Type interfaceToProxy, Type[] additionalInterfacesToProxy, ProxyGenerationOptions options)
  �[m�[37m           at Castle.DynamicProxy.ProxyGenerator.CreateInterfaceProxyWithoutTarget(Type interfaceToProxy, Type[] additionalInterfacesToProxy, ProxyGenerationOptions options, IInterceptor[] interceptors)
  �[m�[37m           at Moq.CastleProxyFactory.CreateProxy(Type mockType, IInterceptor interceptor, Type[] interfaces, Object[] arguments)
  �[m�[37m           at Moq.Mock`1.InitializeInstance()
  �[m�[37m           at Moq.Mock`1.OnGetObject()
  �[m�[37m           at Moq.Mock.get_Object()
  �[m�[37m           at Moq.Mock`1.get_Object()
  �[m�[37m        /home/tester/runtime/src/libraries/Microsoft.Extensions.Hosting/tests/UnitTests/Internal/HostTests.cs(583,0): at Microsoft.Extensions.Hosting.Internal.HostTests.<>c__DisplayClass22_0.<HostStopAsyncCanBeCancelledEarly>b__3(IServiceCollection services)
  �[m�[37m        /home/tester/runtime/src/libraries/Microsoft.Extensions.Hosting/src/HostingHostBuilderExtensions.cs(121,0): at Microsoft.Extensions.Hosting.HostingHostBuilderExtensions.<>c__DisplayClass7_0.<ConfigureServices>b__0(HostBuilderContext context, 

System.Linq.Parallel.Tests

  �[m�[31;1m�[m�[37m      System.NullReferenceException : Object reference not set to an instance of an object.
  �[m�[30;1m      Stack Trace:
  �[m�[37m        /home/tester/runtime/src/libraries/System.Linq.Parallel/src/System/Linq/Parallel/Enumerables/ParallelQuery.cs(104,0): at System.Linq.ParallelQuery`1.Cast[TCastTo]()
  �[m�[37m        /home/tester/runtime/src/libraries/System.Linq.Parallel/src/System/Linq/ParallelEnumerable.cs(5271,0): at System.Linq.ParallelEnumerable.Cast[TResult](ParallelQuery source)
  �[m�[37m        /home/tester/runtime/src/libraries/System.Linq.Parallel/tests/QueryOperators/CastTests.cs(105,0): at System.Linq.Parallel.Tests.CastTests.Cast_Empty(Labeled`1 labeled, Int32 count)

System.Text.Json.Serialization.Tests

  �[m�[31;1m�[m�[37m      System.NullReferenceException : Object reference not set to an instance of an object.
  �[m�[30;1m      Stack Trace:
  �[m�[37m        /home/tester/runtime/src/libraries/System.Text.Json/tests/Serialization/SerializationWrapper.cs(104,0): at System.Text.Json.Serialization.Tests.SerializationWrapper.WriterSerializerWrapper.SerializeWrapper[T](T value, JsonSerializerOptions options)
  �[m�[37m        /home/tester/runtime/src/libraries/System.Text.Json/tests/Serialization/PolymorphicTests.cs(125,0): at System.Text.Json.Serialization.Tests.PolymorphicTests.ArrayAsRootObject()
  �[m�[37m        --- End of stack trace from previous location ---

@janvorli I don't know how to debug this, can you take a look? or give me some pointers?

cc @omajid

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Reflection untriaged New issue has not been triaged by the area owner labels Oct 13, 2020
@janvorli
Copy link
Member

@tmds you can run the test under lldb. The debugger should break in right at the place where the null reference happened. Then you can use SOS commands (provided you have SOS installed - see https://github.com/dotnet/diagnostics/blob/master/documentation/installing-sos-instructions.md) to disassemble the managed method, view call stack including managed frames and their locals and arguments (if they are available in stack slots and current registers), dump managed objects etc.

Essential SOS commands:

clru <address> - disassemble managed method at a given address
clrstack -f -a - dump stack trace including both native and managed stack frames and arguments and locals of managed methods
dumpobj <address> - dump managed reference object at specified address. For arrays, this doesn't dump its entries.
dumpvc <method_table> <address> - dump managed value type at specified address. The method_table represents the type, you can get it e.g. via name2ee <module_name> <full_type_name>
dumparray <address> - dump managed array at a given address including the entries
dso - dump all objects that can be found at the current thread's stack
verifyheap - verifies integrity of the whole managed heap

Documentation for SOS commands supported on Unix can be found at https://github.com/dotnet/diagnostics/blob/c7bc44208fd1c10abc6d4258eb29de0906d2a22e/src/SOS/Strike/sosdocsunix.txt

Are these exceptions happening in specific tests in a reproducible manner or do they seem to be random, hitting different tests each run?

@janvorli
Copy link
Member

I also wonder - are you referring to dotnet/runtime CI or some Redhat's internal CI?

@janvorli
Copy link
Member

If it is our CI, I can definitely take a look myself.

@tmds
Copy link
Member Author

tmds commented Oct 13, 2020

It's on Red Hat internal CI. I don't think dotnet/runtime CI includes rhel8 arm64?

The results differ on each run. In last two runs Microsoft.Extensions.Hosting.Unit.Tests is present, I'll try reproduce using that.

One or more tests failed while running tests from 'Microsoft.Extensions.Hosting.Unit.Tests'. Please check /home/tester/runtime/artifacts/bin/Microsoft.Extensions.Hosting.Unit.Tests/net6.0-Debug/testResults.xml for details! [/home/tester/runtime/src/libraries/Microsoft.Extensions.Hosting/tests/UnitTests/Microsoft.Extensions.Hosting.Unit.Tests.csproj]
One or more tests failed while running tests from 'System.Linq.Parallel.Tests'. Please check /home/tester/runtime/artifacts/bin/System.Linq.Parallel.Tests/net6.0-Debug/testResults.xml for details! [/home/tester/runtime/src/libraries/System.Linq.Parallel/tests/System.Linq.Parallel.Tests.csproj]
One or more tests failed while running tests from 'System.Text.Json.Tests'. Please check /home/tester/runtime/artifacts/bin/System.Text.Json.Tests/net6.0-Debug/testResults.xml for details! [/home/tester/runtime/src/libraries/System.Text.Json/tests/System.Text.Json.Tests.csproj]
One or more tests failed while running tests from 'Microsoft.Extensions.Hosting.Unit.Tests'. Please check /home/tester/runtime/artifacts/bin/Microsoft.Extensions.Hosting.Unit.Tests/net6.0-Debug/testResults.xml for details! [/home/tester/runtime/src/libraries/Microsoft.Extensions.Hosting/tests/UnitTests/Microsoft.Extensions.Hosting.Unit.Tests.csproj]
One or more tests failed while running tests from 'System.Dynamic.Runtime.Tests'. [/home/tester/runtime/src/libraries/System.Dynamic.Runtime/tests/System.Dynamic.Runtime.Tests.csproj]

@tmds
Copy link
Member Author

tmds commented Oct 13, 2020

In last two runs Microsoft.Extensions.Hosting.Unit.Tests is present, I'll try reproduce using that.

This didn't work.

@janvorli
Copy link
Member

It seems it might be related to something with capturing / restoring context around GC suspension, the FlushProcessWriteBuffers not working or something of that kind.
Does RHEL 8 kernel support the MEMBARRIER_CMD_PRIVATE_EXPEDITED? And if it does, can you please check if the s_flushUsingMemBarrier global variable is set to nonzero value during execution? It is set during PAL initialization in InitializeFlushProcessWriteBuffers based on membarrier(MEMBARRIER_CMD_QUERY, 0) result.

@tmds
Copy link
Member Author

tmds commented Oct 13, 2020

Does RHEL 8 kernel support the MEMBARRIER_CMD_PRIVATE_EXPEDITED? And if it does, can you please check if the s_flushUsingMemBarrier global variable is set to nonzero value during execution? It is set during PAL initialization in InitializeFlushProcessWriteBuffers based on membarrier(MEMBARRIER_CMD_QUERY, 0) result.

It is supported:

membarrier(MEMBARRIER_CMD_QUERY, 0)     = 0x7f (MEMBARRIER_CMD_GLOBAL|MEMBARRIER_CMD_GLOBAL_EXPEDITED|MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED|MEMBARRIER_CMD_PRIVATE_EXPEDITED|MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED|MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE|MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE)
membarrier(MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, 0) = 0
(lldb) p s_flushUsingMemBarrier
(int) $0 = 1

I have trouble getting this to reproduce with the debugger. Is there a patch I can make that will call abort() instead of throwing NullReferenceException? What would be a good place to do that? Then I hope I'll be able to collect a coredump when running all tests.

@janvorli
Copy link
Member

The easiest way is to put abort into the sigsegv_handler here:

static void sigsegv_handler(int code, siginfo_t *siginfo, void *context)

Please note that would mean that even NullReferenceExceptions that would otherwise be handled would abort and I guess we have tests that catch these.

@tmds
Copy link
Member Author

tmds commented Oct 15, 2020

Please note that would mean that even NullReferenceExceptions that would otherwise be handled would abort and I guess we have tests that catch these.

I've made the change and eliminated some tests so they pass on my x64 machine. I ran a few CI arm64 builds but they failed in a different way. I'll run some more, maybe one will produce the coredump we need.

@tmds
Copy link
Member Author

tmds commented Oct 15, 2020

CI produced two coredumps.

lldb doesn't like them:

# lldb /home/tester/runtime/artifacts/bin/testhost/net6.0-Linux-Debug-arm64/dotnet --core core.9398 
SetSymbolServer -ms  failed
(lldb) target create "/home/tester/runtime/artifacts/bin/testhost/net6.0-Linux-Debug-arm64/dotnet" --core "core.9398"
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
 #0 0x0000ffffa9aa7688 llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/lib64/libLLVM-11.so+0xa07688)
 #1 0x0000ffffa9aa5818 llvm::sys::RunSignalHandlers() (/lib64/libLLVM-11.so+0xa05818)
 #2 0x0000ffffa9aa611c (/lib64/libLLVM-11.so+0xa0611c)
 #3 0x0000ffffb1a9066c (linux-vdso.so.1+0x66c)
 #4 0x0000ffffb112a06c (/lib64/liblldb.so.11+0x76a06c)
 #5 0x0000ffffb110220c (/lib64/liblldb.so.11+0x74220c)
 #6 0x0000ffffb10a16bc (/lib64/liblldb.so.11+0x6e16bc)
 #7 0x0000ffffb1093598 (/lib64/liblldb.so.11+0x6d3598)
 #8 0x0000ffffb10cdb9c (/lib64/liblldb.so.11+0x70db9c)
 #9 0x0000ffffb10cdd58 (/lib64/liblldb.so.11+0x70dd58)
#10 0x0000ffffb10d9cd8 (/lib64/liblldb.so.11+0x719cd8)
#11 0x0000ffffb1086e3c (/lib64/liblldb.so.11+0x6c6e3c)
#12 0x0000ffffb1086fe4 (/lib64/liblldb.so.11+0x6c6fe4)
#13 0x0000ffffb1087d38 (/lib64/liblldb.so.11+0x6c7d38)
#14 0x0000ffffb1087ffc (/lib64/liblldb.so.11+0x6c7ffc)
#15 0x0000ffffb0fc1b40 (/lib64/liblldb.so.11+0x601b40)
#16 0x0000ffffb1a47708 start_thread (/lib64/libpthread.so.0+0x7708)
#17 0x0000ffffa8d3187c thread_start (/lib64/libc.so.6+0xd187c)
Segmentation fault (core dumped)

printing native stacks using gdb:

(gdb) thread apply all bt

Thread 13 (LWP 111750):
#0  0x0000ffff85887808 in fts_read () from /lib64/libc.so.6
#1  0x0000ffff853e1868 in IpcStream::DiagnosticsIpc::Poll (rgIpcPollHandles=0xffff8356e578, nHandles=1, timeoutMs=-1, callback=0x1)
    at /home/tester/runtime/src/coreclr/src/debug/debug-pal/unix/diagnosticsipc.cpp:225
#2  0x0000ffff85246280 in IpcStreamFactory::GetNextAvailableStream (
    callback=0xffff8520f9c4 <DiagnosticServer::DiagnosticsServerThread(void*)::$_0::__invoke(char const*, unsigned int)>)
    at /home/tester/runtime/src/coreclr/src/vm/ipcstreamfactory.cpp:310
#3  0x0000ffff8520eefc in DiagnosticServer::DiagnosticsServerThread () at /home/tester/runtime/src/coreclr/src/vm/diagnosticserver.cpp:56
#4  0x0000ffff854e52ec in CorUnix::CPalThread::ThreadEntry (pvParam=0xaaaafe6c30f0) at /home/tester/runtime/src/coreclr/src/pal/src/thread/thread.cpp:1845
#5  0x0000ffff85c377f8 in start_thread () from /lib64/libpthread.so.0
#6  0x0000ffff85890edc in get_nprocs_conf () from /lib64/libc.so.6
#7  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 12 (LWP 111729):
#0  0x0000ffff8588d170 in openlog_internal () from /lib64/libc.so.6
#1  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 11 (LWP 111733):
#0  0x0000ffff85887808 in fts_read () from /lib64/libc.so.6
#1  0x0000ffff854dd368 in CorUnix::CPalSynchronizationManager::ReadBytesFromProcessPipe (this=0x0, iTimeout=<optimized out>, pRecvBuf=0xffff83dbe88c "", 
    iBytes=1) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:2233
#2  0x0000ffff854dca48 in CorUnix::CPalSynchronizationManager::ReadCmdFromProcessPipe (this=0xaaaafe64f190, iPollTimeout=-1, 
    pshridMarshaledData=<optimized out>, pswcWorkerCmd=<optimized out>, pdwData=<optimized out>)
    at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:2011
#3  CorUnix::CPalSynchronizationManager::WorkerThread (pArg=0xaaaafe64f190) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:1714
#4  0x0000ffff854e52ec in CorUnix::CPalThread::ThreadEntry (pvParam=0xaaaafe6500b0) at /home/tester/runtime/src/coreclr/src/pal/src/thread/thread.cpp:1845
#5  0x0000ffff85c377f8 in start_thread () from /lib64/libpthread.so.0
#6  0x0000ffff85890edc in get_nprocs_conf () from /lib64/libc.so.6
#7  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 10 (LWP 111756):
#0  0x0000ffff85c3d8d0 in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1  0x0000ffff854dae68 in CorUnix::CPalSynchronizationManager::ThreadNativeWait (ptnwdNativeWaitData=0xaaaafe6e72d0, dwTimeout=<optimized out>, 
    ptwrWakeupReason=0xffff824ee5f4, pdwSignaledObject=0xffff824ee5f0) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:478
#2  0x0000ffff854daacc in CorUnix::CPalSynchronizationManager::BlockThread (this=0xaaaafe64f190, pthrCurrent=0xaaaafe6e7110, dwTimeout=4294967295, 
    fAlertable=false, fIsSleep=<optimized out>, ptwrWakeupReason=0xffff824ee688, pdwSignaledObject=0xffff824ee68c)
    at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:301
#3  0x0000ffff854deed0 in CorUnix::InternalWaitForMultipleObjectsEx (pThread=0xaaaafe6e7110, nCount=<optimized out>, lpHandles=<optimized out>, 
    bWaitAll=<optimized out>, dwMilliseconds=<optimized out>, bAlertable=0, bPrioritize=<optimized out>)
    at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/wait.cpp:637
#4  0x0000ffff853d99a4 in DebuggerRCThread::MainLoop (this=0xaaaafe6d7090) at /home/tester/runtime/src/coreclr/src/debug/ee/rcthread.cpp:970
#5  0x0000ffff853d9810 in DebuggerRCThread::ThreadProc (this=0xaaaafe6d7090) at /home/tester/runtime/src/coreclr/src/debug/ee/rcthread.cpp:775
#6  0x0000ffff853d95a8 in DebuggerRCThread::ThreadProcStatic () at /home/tester/runtime/src/coreclr/src/debug/ee/rcthread.cpp:1359
#7  0x0000ffff854e52ec in CorUnix::CPalThread::ThreadEntry (pvParam=0xaaaafe6e7110) at /home/tester/runtime/src/coreclr/src/pal/src/thread/thread.cpp:1845
#8  0x0000ffff85c377f8 in start_thread () from /lib64/libpthread.so.0
#9  0x0000ffff85890edc in get_nprocs_conf () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 9 (LWP 111754):
#0  0x0000ffff85c41b80 in pread64 () from /lib64/libpthread.so.0
#1  0x0000ffff853e1ffc in TwoWayPipe::WaitForConnection (this=0xaaaafe6dbc40) at /home/tester/runtime/src/coreclr/src/debug/debug-pal/unix/twowaypipe.cpp:87
#2  0x0000ffff853dc190 in DbgTransportSession::TransportWorker (this=0xaaaafe6dbb50)
    at /home/tester/runtime/src/coreclr/src/debug/ee/../shared/dbgtransportsession.cpp:1319
#3  0x0000ffff853db288 in DbgTransportSession::TransportWorkerStatic (pvContext=0xffffffffffffff9c)
    at /home/tester/runtime/src/coreclr/src/debug/ee/../shared/dbgtransportsession.cpp:1235
#4  0x0000ffff854e52ec in CorUnix::CPalThread::ThreadEntry (pvParam=0xaaaafe6e67a0) at /home/tester/runtime/src/coreclr/src/pal/src/thread/thread.cpp:1845
#5  0x0000ffff85c377f8 in start_thread () from /lib64/libpthread.so.0
#6  0x0000ffff85890edc in get_nprocs_conf () from /lib64/libc.so.6
#7  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 8 (LWP 111991):
#0  0x0000ffff8587b058 in _getopt_internal_r () from /lib64/libc.so.6
#1  0x0000aaaafe7014f0 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Thread 7 (LWP 111764):
#0  0x0000ffff85c3dd18 in pthread_cond_signal@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1  0x0000ffff85c3dcfc in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libpthread.so.0
--Type <RET> for more, q to quit, c to continue without paging--c
#2  0x0000ffff854dae54 in CorUnix::CPalSynchronizationManager::ThreadNativeWait (ptnwdNativeWaitData=0xaaaafe73aea0, dwTimeout=<optimized out>, ptwrWakeupReason=0xffff8089e7a4, pdwSignaledObject=0xffff8089e7a0) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:483
#3  0x0000ffff854daacc in CorUnix::CPalSynchronizationManager::BlockThread (this=0xaaaafe64f190, pthrCurrent=0xaaaafe73ace0, dwTimeout=100, fAlertable=true, fIsSleep=<optimized out>, ptwrWakeupReason=0xffff8089e810, pdwSignaledObject=0xffff8089e814) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:301
#4  0x0000ffff854df73c in CorUnix::InternalSleepEx (pThread=0xaaaafe73ace0, dwMilliseconds=100, bAlertable=1) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/wait.cpp:850
#5  SleepEx (dwMilliseconds=100, bAlertable=1) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/wait.cpp:285
#6  0x0000ffff851d4214 in ThreadpoolMgr::TimerThreadFire () at /home/tester/runtime/src/coreclr/src/vm/win32threadpool.cpp:4555
#7  0x0000ffff851d40dc in ThreadpoolMgr::TimerThreadStart (p=0xffffdfbe9ef0) at /home/tester/runtime/src/coreclr/src/vm/win32threadpool.cpp:4532
#8  0x0000ffff854e52ec in CorUnix::CPalThread::ThreadEntry (pvParam=0xaaaafe73ace0) at /home/tester/runtime/src/coreclr/src/pal/src/thread/thread.cpp:1845
#9  0x0000ffff85c377f8 in start_thread () from /lib64/libpthread.so.0
#10 0x0000ffff85890edc in get_nprocs_conf () from /lib64/libc.so.6
#11 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 6 (LWP 111974):
#0  0x0000ffff85c3d8d0 in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1  0x0000ffff854dae68 in CorUnix::CPalSynchronizationManager::ThreadNativeWait (ptnwdNativeWaitData=0xaaaafeb28750, dwTimeout=<optimized out>, ptwrWakeupReason=0xffff0b44d9d4, pdwSignaledObject=0xffff0b44d9d0) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:478
#2  0x0000ffff854daacc in CorUnix::CPalSynchronizationManager::BlockThread (this=0xaaaafe64f190, pthrCurrent=0xaaaafeb28590, dwTimeout=4294967295, fAlertable=true, fIsSleep=<optimized out>, ptwrWakeupReason=0xffff0b44da68, pdwSignaledObject=0xffff0b44da6c) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:301
#3  0x0000ffff854deed0 in CorUnix::InternalWaitForMultipleObjectsEx (pThread=0xaaaafeb28590, nCount=<optimized out>, lpHandles=<optimized out>, bWaitAll=<optimized out>, dwMilliseconds=<optimized out>, bAlertable=1, bPrioritize=<optimized out>) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/wait.cpp:637
#4  0x0000ffff851b1c70 in Thread::DoAppropriateAptStateWait (this=<optimized out>, numWaiters=<optimized out>, pHandles=0xffff0b44dd18, bWaitAll=1, timeout=<optimized out>, mode=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:3310
#5  Thread::DoAppropriateWaitWorker (this=0xaaaafeb27d70, countHandles=1, handles=0xffff0b44dd18, waitAll=1, millis=4294967295, mode=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:3442
#6  0x0000ffff851acf0c in Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*)::$_0::operator()(Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*)::__EEParam*) const (this=<optimized out>, __pEEParam=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:3159
#7  Thread::DoAppropriateWait (this=0xaaaafeb287ac, countHandles=1, handles=0x0, waitAll=0, millis=189067960, mode=(unknown: 189069488), syncState=0x0) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:3174
#8  0x0000ffff852089a4 in WaitHandleNative::CorWaitOneNative (handle=<optimized out>, timeout=-1) at /home/tester/runtime/src/coreclr/src/vm/comwaithandle.cpp:31
#9  0x0000ffff0c119334 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 5 (LWP 111777):
#0  0x0000ffff85c3dd18 in pthread_cond_signal@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1  0x0000ffff85c3dcfc in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#2  0x0000ffff854dae54 in CorUnix::CPalSynchronizationManager::ThreadNativeWait (ptnwdNativeWaitData=0xfffed8002970, dwTimeout=<optimized out>, ptwrWakeupReason=0xffff807edb94, pdwSignaledObject=0xffff807edb90) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:483
#3  0x0000ffff854daacc in CorUnix::CPalSynchronizationManager::BlockThread (this=0xaaaafe64f190, pthrCurrent=0xfffed80027b0, dwTimeout=12000, fAlertable=true, fIsSleep=<optimized out>, ptwrWakeupReason=0xffff807edc28, pdwSignaledObject=0xffff807edc2c) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:301
#4  0x0000ffff854deed0 in CorUnix::InternalWaitForMultipleObjectsEx (pThread=0xfffed80027b0, nCount=<optimized out>, lpHandles=<optimized out>, bWaitAll=<optimized out>, dwMilliseconds=<optimized out>, bAlertable=1, bPrioritize=<optimized out>) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/wait.cpp:637
#5  0x0000ffff851b1c70 in Thread::DoAppropriateAptStateWait (this=<optimized out>, numWaiters=<optimized out>, pHandles=0xffff807eded8, bWaitAll=1, timeout=<optimized out>, mode=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:3310
#6  Thread::DoAppropriateWaitWorker (this=0xfffed80019f0, countHandles=1, handles=0xffff807eded8, waitAll=1, millis=12000, mode=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:3442
#7  0x0000ffff851acf0c in Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*)::$_0::operator()(Thread::DoAppropriateWait(int, void**, int, unsigned int, WaitMode, PendingSync*)::__EEParam*) const (this=<optimized out>, __pEEParam=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:3159
#8  Thread::DoAppropriateWait (this=0xfffed80029cc, countHandles=1, handles=0x0, waitAll=-2139170088, millis=0, mode=(unknown: 2155804848), syncState=0x0) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:3174
#9  0x0000ffff852089a4 in WaitHandleNative::CorWaitOneNative (handle=<optimized out>, timeout=12000) at /home/tester/runtime/src/coreclr/src/vm/comwaithandle.cpp:31
#10 0x0000ffff0c119334 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 4 (LWP 111730):
#0  0x0000ffff8588d170 in openlog_internal () from /lib64/libc.so.6
#1  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 3 (LWP 111780):
#0  0x0000ffff85c3dd18 in pthread_cond_signal@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1  0x0000ffff85c3dcfc in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#2  0x0000ffff854dae54 in CorUnix::CPalSynchronizationManager::ThreadNativeWait (ptnwdNativeWaitData=0xfffed8003350, dwTimeout=<optimized out>, ptwrWakeupReason=0xffff7ffbe7b4, pdwSignaledObject=0xffff7ffbe7b0) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:483
#3  0x0000ffff854daacc in CorUnix::CPalSynchronizationManager::BlockThread (this=0xaaaafe64f190, pthrCurrent=0xfffed8003190, dwTimeout=500, fAlertable=false, fIsSleep=<optimized out>, ptwrWakeupReason=0xffff7ffbe820, pdwSignaledObject=0xffff7ffbe824) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:301
#4  0x0000ffff854df73c in CorUnix::InternalSleepEx (pThread=0xfffed8003190, dwMilliseconds=500, bAlertable=0) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/wait.cpp:850
#5  SleepEx (dwMilliseconds=500, bAlertable=0) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/wait.cpp:285
#6  0x0000ffff85233910 in ClrSleepEx (dwMilliseconds=3623891880, bAlertable=0) at /home/tester/runtime/src/coreclr/src/vm/hosting.cpp:259
#7  __SwitchToThread (dwSleepMSec=3623891880, dwSwitchCount=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/hosting.cpp:283
#8  0x0000ffff851d38c8 in GateThreadTimer::Wait (this=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/win32threadpool.cpp:4042
#9  ThreadpoolMgr::GateThreadStart (lpArgs=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/win32threadpool.cpp:4140
#10 0x0000ffff854e52ec in CorUnix::CPalThread::ThreadEntry (pvParam=0xfffed8003190) at /home/tester/runtime/src/coreclr/src/pal/src/thread/thread.cpp:1845
#11 0x0000ffff85c377f8 in start_thread () from /lib64/libpthread.so.0
#12 0x0000ffff85890edc in get_nprocs_conf () from /lib64/libc.so.6
#13 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (LWP 111759):
#0  0x0000ffff85c3dd18 in pthread_cond_signal@@GLIBC_2.17 () from /lib64/libpthread.so.0
#1  0x0000ffff85c3dcfc in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libpthread.so.0
#2  0x0000ffff854dae54 in CorUnix::CPalSynchronizationManager::ThreadNativeWait (ptnwdNativeWaitData=0xaaaafe6fb370, dwTimeout=<optimized out>, ptwrWakeupReason=0xffff8168e3d4, pdwSignaledObject=0xffff8168e3d0) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:483
#3  0x0000ffff854daacc in CorUnix::CPalSynchronizationManager::BlockThread (this=0xaaaafe64f190, pthrCurrent=0xaaaafe6fb1b0, dwTimeout=2000, fAlertable=false, fIsSleep=<optimized out>, ptwrWakeupReason=0xffff8168e468, pdwSignaledObject=0xffff8168e46c) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/synchmanager.cpp:301
#4  0x0000ffff854deed0 in CorUnix::InternalWaitForMultipleObjectsEx (pThread=0xaaaafe6fb1b0, nCount=<optimized out>, lpHandles=<optimized out>, bWaitAll=<optimized out>, dwMilliseconds=<optimized out>, bAlertable=0, bPrioritize=<optimized out>) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/wait.cpp:637
#5  0x0000ffff854df114 in WaitForSingleObjectEx (hHandle=0x94, dwMilliseconds=2000, bAlertable=0) at /home/tester/runtime/src/coreclr/src/pal/src/synchmgr/wait.cpp:138
#6  0x0000ffff852bc49c in CLREventWaitHelper2 (handle=0xaaaafe6fb3c8, dwMilliseconds=128, alertable=0) at /home/tester/runtime/src/coreclr/src/vm/synch.cpp:376
#7  CLREventWaitHelper(void*, unsigned int, int)::$_1::operator()(CLREventWaitHelper(void*, unsigned int, int)::Param*) const (this=<optimized out>, pParam=0xffff8168e660) at /home/tester/runtime/src/coreclr/src/vm/synch.cpp:401
#8  CLREventWaitHelper (dwMilliseconds=<optimized out>, alertable=<optimized out>, handle=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/synch.cpp:403
#9  CLREventBase::WaitEx (this=<optimized out>, dwMilliseconds=<optimized out>, mode=<optimized out>, syncState=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/synch.cpp:470
#10 0x0000ffff8522b268 in FinalizerThread::WaitForFinalizerEvent (event=0xaaaafe6f9e60) at /home/tester/runtime/src/coreclr/src/vm/finalizerthread.cpp:124
#11 0x0000ffff8522b42c in FinalizerThread::FinalizerThreadWorker (args=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/finalizerthread.cpp:252
#12 0x0000ffff851b5730 in ManagedThreadBase_DispatchInner (pCallState=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:7284
#13 ManagedThreadBase_DispatchMiddle (pCallState=0x3b9aca00000000) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:7328
#14 ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const::{lambda(Param*)#1}::operator()(Param*) const (this=<optimized out>, pParam=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:7487
#15 ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::$_6::operator()(ManagedThreadBase_DispatchOuter(ManagedThreadCallState*)::TryArgs*) const (this=<optimized out>, pArgs=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:7489
#16 ManagedThreadBase_DispatchOuter (pCallState=0xffff8168e890) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:7515
#17 0x0000ffff851b5dcc in ManagedThreadBase_NoADTransition (pTarget=<optimized out>, filterType=FinalizerThread) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:7559
#18 ManagedThreadBase::FinalizerBase (pTarget=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/threads.cpp:7585
#19 0x0000ffff8522b668 in FinalizerThread::FinalizerThreadStart (args=<optimized out>) at /home/tester/runtime/src/coreclr/src/vm/finalizerthread.cpp:379
#20 0x0000ffff854e52ec in CorUnix::CPalThread::ThreadEntry (pvParam=0xaaaafe6fb1b0) at /home/tester/runtime/src/coreclr/src/pal/src/thread/thread.cpp:1845
#21 0x0000ffff85c377f8 in start_thread () from /lib64/libpthread.so.0
#22 0x0000ffff85890edc in get_nprocs_conf () from /lib64/libc.so.6
#23 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (LWP 111723):
#0  0x0000ffff857f2c3c in raise () from /lib64/libc.so.6
#1  0x0000ffff857e07a8 in abort () from /lib64/libc.so.6
#2  0x0000ffff854b0230 in sigsegv_handler (code=0, siginfo=0xffff835aeb48, context=0x0) at /home/tester/runtime/src/coreclr/src/pal/src/exception/signal.cpp:514
#3  <signal handler called>
#4  0x0000ffff0c2ea7e8 in ?? ()
#5  0x0000000000000010 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

managed stacks using dotnet-dump:

# dotnet dump analyze core.9398 
Loading core dump: core.9398 ...
Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command.
Type 'quit' or 'exit' to exit the session.
> clrstack                                                                                                                                                   
OS Thread Id: 0x1b46b (0)
        Child SP               IP Call Site
> clrstack -a                                                                                                                                                
OS Thread Id: 0x1b46b (0)
        Child SP               IP Call Site
> clrstack -all                                                                                                                                              
OS Thread Id: 0x1b46b
        Child SP               IP Call Site
OS Thread Id: 0x1b48f
        Child SP               IP Call Site
0000FFFF8168E790 0000ffff85c3dd18 [DebuggerU2MCatchHandlerFrame: 0000ffff8168e790] 
OS Thread Id: 0x1b494
        Child SP               IP Call Site
OS Thread Id: 0x1b4a1
        Child SP               IP Call Site
0000FFFF807EDF08 0000ffff85c3dd18 [HelperMethodFrame: 0000ffff807edf08] System.Threading.WaitHandle.WaitOneCore(IntPtr, Int32)
0000FFFF807EE080 0000FFFF0C119334 System.Threading.WaitHandle.WaitOneNoCheck(Int32) [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs @ 144]
0000FFFF807EE110 0000FFFF0C2B53F0 System.Threading.WaitHandle.WaitOne(Int32) [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs @ 119]
0000FFFF807EE150 0000FFFF0C2B52FC Xunit.DelegatingLongRunningTestDetectionSink.WaitForStopEvent(Int32) [C:\Dev\xunit\xunit\src\xunit.runner.utility\Sinks\DelegatingSinks\DelegatingLongRunningTestDetectionSink.cs @ 169]
0000FFFF807EE180 0000FFFF0C2B4AC8 Xunit.DelegatingLongRunningTestDetectionSink.ThreadWorker() [C:\Dev\xunit\xunit\src\xunit.runner.utility\Sinks\DelegatingSinks\DelegatingLongRunningTestDetectionSink.cs @ 153]
0000FFFF807EE1C0 0000FFFF0C2B45C0 Xunit.Sdk.XunitWorkerThread+<>c.<QueueUserWorkItem>b__4_0(System.Object) [C:\Dev\xunit\xunit\src\common\XunitWorkerThread.cs @ 92]
0000FFFF807EE200 0000FFFF0C2B416C System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute() [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPool.cs @ 853]
0000FFFF807EE230 0000FFFF0C2AFBD8 System.Threading.ThreadPoolWorkQueue.Dispatch() [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPool.cs @ 641]
0000FFFF807EE2A0 0000FFFF0C2AF62C System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() [/home/tester/runtime/src/coreclr/src/System.Private.CoreLib/src/System/Threading/ThreadPool.CoreCLR.cs @ 29]
0000FFFF807EE690 0000ffff85379188 [DebuggerU2MCatchHandlerFrame: 0000ffff807ee690] 
OS Thread Id: 0x1b566
        Child SP               IP Call Site
0000FFFF0B44DD48 0000ffff85c3d8d0 [HelperMethodFrame: 0000ffff0b44dd48] System.Threading.WaitHandle.WaitOneCore(IntPtr, Int32)
0000FFFF0B44DEC0 0000FFFF0C119334 System.Threading.WaitHandle.WaitOneNoCheck(Int32) [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs @ 144]
0000FFFF0B44DF50 0000FFFF0C119444 Xunit.Sdk.MessageBus.ReporterWorker() [C:\Dev\xunit\xunit\src\xunit.execution\Sdk\MessageBus.cs @ 80]
0000FFFF0B44DF70 0000FFFF0C117DD0 Xunit.Sdk.XunitWorkerThread+<>c.<QueueUserWorkItem>b__5_0(System.Object) [C:\Dev\xunit\xunit\src\common\XunitWorkerThread.cs @ 37]
0000FFFF0B44DFB0 0000FFFF0C117D44 System.Threading.Tasks.Task.InnerInvoke() [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 2377]
0000FFFF0B44DFF0 0000FFFF0C117C74 System.Threading.Tasks.Task+<>c.<.cctor>b__277_0(System.Object) [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 2359]
0000FFFF0B44E020 0000FFFF0C1173BC System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 186]
0000FFFF0B44E0C0 0000FFFF0C1179BC System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread) [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 2317]
0000FFFF0B44E180 0000FFFF0C1176E8 System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread) [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 2255]
0000FFFF0B44E1B0 0000FFFF0C117634 System.Threading.Tasks.ThreadPoolTaskScheduler+<>c.<.cctor>b__10_0(System.Object) [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/ThreadPoolTaskScheduler.cs @ 35]
0000FFFF0B44E1E0 0000FFFF0C117560 System.Threading.ThreadHelper.ThreadStart_Context(System.Object) [/home/tester/runtime/src/coreclr/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 47]
0000FFFF0B44E220 0000FFFF0C1173BC System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/home/tester/runtime/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 186]
0000FFFF0B44E2C0 0000FFFF0C1171EC System.Threading.ThreadHelper.ThreadStart(System.Object) [/home/tester/runtime/src/coreclr/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 75]
0000FFFF0B44E750 0000ffff85379188 [DebuggerU2MCatchHandlerFrame: 0000ffff0b44e750] 
OS Thread Id: 0x1b577
        Child SP               IP Call Site

I find it weird code=0 gets passed to sigsegv_handler. Other than that, I don't see much in this output.

@janvorli do you see something useful in here? Are there some other tests I could run that may show something interesting?

@janvorli
Copy link
Member

Have you opened the dump on the same machine where it was generated?

@tmds
Copy link
Member Author

tmds commented Oct 15, 2020

At the end of the CI build, the testhost and coredumps are collected. I opened those on another arm64 rhel8 machine.

@tmds
Copy link
Member Author

tmds commented Nov 5, 2020

I ran a few experiments which suggest the issue is in the rhel8 kernel.

@jkotas jkotas added tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly area-PAL-coreclr and removed area-System.Reflection labels Nov 12, 2020
@tmds
Copy link
Member Author

tmds commented Jan 4, 2021

@janvorli do you have some suggestion on what stand-alone applications I may try to run that could trigger this issue? So far I'm running the whole build+tests and see only 1-5 occurrences at random places. It would be nice if I could find something smaller but I don't know what I'm looking for.

@janvorli
Copy link
Member

janvorli commented Jan 4, 2021

It is hard to say what app could repro the problem when we don't know where it is stemming from. However, I would recommend trying to repro it with coreclr pri 1 tests running with GC stress 3 (checked build of coreclr is needed for that to work). My guess is that it could raise the frequency of the problem considerably and might even get it repro in 90-100% cases on certain tests. Then you can pick one of such tests and try to run it with GC stress enabled under lldb to debug it.
To run coreclr tests with GC stress 3 enabled, you just need to add --gcstresslevel=3 option to the src/tests/run.sh command line that you use to run the tests. When running a specific test under lldb, you'd use the env COMPlus_GCStress=3 lldb command before the r command. Setting it in the shell before launching lldb is not recommended as it would also influence SOS.
I can also try to repro it locally on my Odroid N2 device. I assume it would likely repro under CentOS 8 too, right? Or is the kernel different there?

@tmds
Copy link
Member Author

tmds commented Jan 6, 2021

I'm not familiar with running those tests. These are the commands I'm using:

$ ./build.sh clr+libs -rc checked --librariesConfiguration Release /p:NoPgoOptimize=true
$ ./src/tests/build.sh arm64 checked

I got about 25 NullReferenceExceptions during this build command.
Interestingly, they all happen at:

at System.SZArrayHelper.GetEnumerator[T]()

@janvorli maybe this tells you something?

I tried to invoke this command a couple of time, but the build doesn't seem to be incremental. So it starts over and errors out at some point.

Probably there are some tests already I can run, but I don't know how to start them.
I tried:

$ ./src/tests/run.sh arm64 checked --gcstresslevel=3
Running on  CPU- arm64
testRootDir and other existing arguments is no longer required. If the 
default location is incorrect or does not exist, please use 
--testRootDir to explicitly override the defaults.

Build Architecture            : arm64
Build Configuration           : Checked

python /root/runtime/src/tests/../../src/tests/run.py -arch arm64 -build_type Checked
Error, Core_Root could not be determined, or points to a location that doesn't exist.

I assume it would likely repro under CentOS 8 too, right?

Yes, it should. I had to update binutils to a newer version and lower cmake_minimum_required in src/tests/profiler/native/CMakeLists.txt to the version that comes with RHEL 8.

@janvorli
Copy link
Member

janvorli commented Jan 6, 2021

Ok, if the test build fails consistently, then you can build them using those two commands you've tried, but on a different Linux arm64 distro and then copy over everything under the artifacts/tests/Linux.arm64.Checked from the build machine to the same subfolder of the runtime repo on your RHEL 8 machine. If you have the distro on the same absolute path on both machines, source level debugging should just work after the build.
The src/tests/build.sh prints the command line to use to run all tests at the end of the tests build, so you'd just add the --gcstresslevel=3 to it.

If the build of tests didn't complete, you cannot most likely run anything. The test build has several phases and the last phase is building test wrappers for all the tests to allow running them using xunit.

@tmds
Copy link
Member Author

tmds commented Jan 8, 2021

I compiled on Fedora, and then ran on RHEL8. Unfortunately, the tests did not run due to the glibc version being older, so I added a Fedora container in the middle to workaround it.

I forgot to add the priority1 argument, so I think I ran all tests?

4 tests failed, the errors are below the results table. @janvorli do you see something interesting?

     baseservices.callconvs.XUnitWrapper                  Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    13.447s
     baseservices.compilerservices.XUnitWrapper           Total:    2, Errors: 0, Failed: 0, Skipped: 0, Time:    10.011s
     baseservices.exceptions.XUnitWrapper.dll             Total:    0
     baseservices.mono.XUnitWrapper.dll                   Total:    0
     baseservices.threading.XUnitWrapper                  Total:    4, Errors: 0, Failed: 0, Skipped: 0, Time:    12.921s
     baseservices.TieredCompilation.XUnitWrapper          Total:   13, Errors: 0, Failed: 0, Skipped: 0, Time:    77.131s
     baseservices.typeequivalence.XUnitWrapper            Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:     5.846s
     baseservices.varargs.XUnitWrapper                    Total:    2, Errors: 0, Failed: 0, Skipped: 0, Time:    11.624s
     CoreMangLib.system.XUnitWrapper                      Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    11.355s
     Exceptions.ForeignThread.XUnitWrapper                Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    11.841s
     GC.API.XUnitWrapper                                  Total:   32, Errors: 0, Failed: 0, Skipped: 0, Time:   147.497s
     GC.Coverage.XUnitWrapper                             Total:    2, Errors: 0, Failed: 0, Skipped: 0, Time:   456.521s
     GC.Features.XUnitWrapper                             Total:   21, Errors: 0, Failed: 0, Skipped: 0, Time:     0.088s
     GC.LargeMemory.XUnitWrapper                          Total:    3, Errors: 0, Failed: 0, Skipped: 0, Time:     0.057s
     GC.Regressions.XUnitWrapper                          Total:   10, Errors: 0, Failed: 0, Skipped: 0, Time:    11.139s
     GC.Scenarios.XUnitWrapper                            Total:   35, Errors: 0, Failed: 0, Skipped: 0, Time:     0.112s
     GC.Stress.XUnitWrapper                               Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:     0.061s
     ilasm.PortablePdb.XUnitWrapper                       Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:   343.941s
     ilasm.System.XUnitWrapper                            Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    32.755s
     Interop.ArrayMarshalling.XUnitWrapper                Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    28.288s
     Interop.COM.XUnitWrapper                             Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:     8.989s
     Interop.DllImportAttribute.XUnitWrapper              Total:    2, Errors: 0, Failed: 0, Skipped: 0, Time:    14.132s
     Interop.ExecInDefAppDom.XUnitWrapper.dll             Total:    0
     Interop.ICastable.XUnitWrapper                       Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    11.434s
     Interop.ICustomMarshaler.XUnitWrapper                Total:    3, Errors: 0, Failed: 0, Skipped: 0, Time:   316.382s
     Interop.IDynamicInterfaceCastable.XUnitWrapper       Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    11.933s
     Interop.LayoutClass.XUnitWrapper                     Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:     8.572s
     Interop.MarshalAPI.XUnitWrapper                      Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:     8.596s
     Interop.NativeLibrary.XUnitWrapper                   Total:    4, Errors: 0, Failed: 0, Skipped: 0, Time:    15.564s
     Interop.PInvoke.XUnitWrapper                         Total:   29, Errors: 0, Failed: 0, Skipped: 0, Time:   180.974s
     Interop.StringMarshalling.XUnitWrapper               Total:    3, Errors: 0, Failed: 0, Skipped: 0, Time:    10.225s
     Interop.StructMarshalling.XUnitWrapper               Total:    3, Errors: 0, Failed: 0, Skipped: 0, Time:    16.794s
     Interop.StructPacking.XUnitWrapper                   Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    22.516s
     Interop.UnmanagedCallersOnly.XUnitWrapper            Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    16.743s
     JIT.CheckProjects.XUnitWrapper                       Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:     0.058s
     JIT.CodeGenBringUpTests.XUnitWrapper                 Total:   32, Errors: 0, Failed: 0, Skipped: 0, Time:    20.796s
     JIT.Directed.XUnitWrapper                            Total:  132, Errors: 0, Failed: 0, Skipped: 0, Time:   912.228s
     JIT.Generics.XUnitWrapper                            Total:   41, Errors: 0, Failed: 0, Skipped: 0, Time:   114.010s
     JIT.HardwareIntrinsics.XUnitWrapper                  Total:  353, Errors: 0, Failed: 0, Skipped: 0, Time:   844.790s
     JIT.IL_Conformance.XUnitWrapper                      Total:  112, Errors: 0, Failed: 0, Skipped: 0, Time:    54.963s
     JIT.Intrinsics.XUnitWrapper                          Total:   23, Errors: 0, Failed: 0, Skipped: 0, Time:    87.320s
     JIT.jit64.XUnitWrapper                               Total:   90, Errors: 0, Failed: 0, Skipped: 0, Time:   591.151s
     JIT.Methodical.XUnitWrapper                          Total:  763, Errors: 0, Failed: 0, Skipped: 0, Time:   654.088s
     JIT.opt.XUnitWrapper                                 Total:   71, Errors: 0, Failed: 0, Skipped: 0, Time:  1017.640s
     JIT.Performance.XUnitWrapper                         Total:   85, Errors: 0, Failed: 0, Skipped: 0, Time:   146.071s
     JIT.Regression.XUnitWrapper                          Total:  629, Errors: 0, Failed: 1, Skipped: 0, Time:   533.548s
     JIT.RyuJIT.XUnitWrapper                              Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    10.755s
     JIT.SIMD.XUnitWrapper                                Total:   99, Errors: 0, Failed: 0, Skipped: 0, Time:    56.267s
     JIT.Stress.XUnitWrapper                              Total:    5, Errors: 0, Failed: 0, Skipped: 0, Time:     0.068s
     JIT.superpmi.XUnitWrapper                            Total:    3, Errors: 0, Failed: 0, Skipped: 0, Time:    51.952s
     Loader.AssemblyDependencyResolver.XUnitWrapper       Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    51.067s
     Loader.AssemblyLoadContext30Extensions.XUnitWrapper  Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    11.426s
     Loader.binding.XUnitWrapper                          Total:    2, Errors: 0, Failed: 2, Skipped: 0, Time:  7200.339s
     Loader.classloader.XUnitWrapper                      Total:  108, Errors: 0, Failed: 0, Skipped: 0, Time:    87.212s
     Loader.CollectibleAssemblies.XUnitWrapper            Total:    2, Errors: 0, Failed: 0, Skipped: 0, Time:    11.548s
     Loader.ContextualReflection.XUnitWrapper             Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    32.499s
     Loader.regressions.XUnitWrapper                      Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    11.038s
     profiler.elt.XUnitWrapper                            Total:    2, Errors: 0, Failed: 0, Skipped: 0, Time:    42.047s
     profiler.eventpipe.XUnitWrapper                      Total:    2, Errors: 0, Failed: 0, Skipped: 0, Time:     0.060s
     profiler.gc.XUnitWrapper                             Total:    2, Errors: 0, Failed: 0, Skipped: 0, Time:     0.062s
     profiler.rejit.XUnitWrapper                          Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:     0.059s
     profiler.transitions.XUnitWrapper                    Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    36.896s
     profiler.unittest.XUnitWrapper                       Total:    3, Errors: 0, Failed: 0, Skipped: 0, Time:    63.478s
     readytorun.crossgen2.XUnitWrapper                    Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    42.717s
     readytorun.DynamicMethodGCStress.XUnitWrapper.dll    Total:    0
     readytorun.multifolder.XUnitWrapper                  Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:     0.060s
     readytorun.r2rdump.XUnitWrapper                      Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:     0.060s
     readytorun.tests.XUnitWrapper                        Total:    3, Errors: 0, Failed: 0, Skipped: 0, Time:    20.173s
     reflection.DefaultInterfaceMethods.XUnitWrapper      Total:    3, Errors: 0, Failed: 0, Skipped: 0, Time:    17.853s
     reflection.Modifiers.XUnitWrapper                    Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    11.264s
     reflection.SetValue.XUnitWrapper                     Total:    2, Errors: 0, Failed: 0, Skipped: 0, Time:    24.548s
     reflection.StaticInterfaceMembers.XUnitWrapper       Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    10.220s
     Regressions.coreclr.XUnitWrapper                     Total:   23, Errors: 0, Failed: 0, Skipped: 0, Time:    12.071s
     tracing.eventactivityidcontrol.XUnitWrapper          Total:    1, Errors: 0, Failed: 1, Skipped: 0, Time:   118.302s
     tracing.eventcounter.XUnitWrapper                    Total:    5, Errors: 0, Failed: 0, Skipped: 0, Time:     0.064s
     tracing.eventlistener.XUnitWrapper                   Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:    22.035s
     tracing.eventpipe.XUnitWrapper                       Total:   10, Errors: 0, Failed: 0, Skipped: 0, Time:     0.065s
     tracing.eventsource.XUnitWrapper                     Total:    1, Errors: 0, Failed: 0, Skipped: 0, Time:     0.060s
     tracing.tracevalidation.XUnitWrapper                 Total:    3, Errors: 0, Failed: 0, Skipped: 0, Time:     0.062s
                                                                 ----          -          -           -        ----------
                                                    GRAND TOTAL: 2807          0          4           0        14740.480s (14751.982s)
/root/runtime/src/tests/Common/tests.targets(74,5): error MSB3073: The command "/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root/corerun /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root/xunit/xunit.console.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/callconvs/baseservices.callconvs.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/compilerservices/baseservices.compilerservices.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/exceptions/baseservices.exceptions.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/mono/baseservices.mono.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/threading/baseservices.threading.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/TieredCompilation/baseservices.TieredCompilation.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/typeequivalence/baseservices.typeequivalence.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/varargs/baseservices.varargs.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/CoreMangLib/system/CoreMangLib.system.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Exceptions/ForeignThread/Exceptions.ForeignThread.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/API/GC.API.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/Coverage/GC.Coverage.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/Features/GC.Features.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/LargeMemory/GC.LargeMemory.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/Regressions/GC.Regressions.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/Scenarios/GC.Scenarios.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/Stress/GC.Stress.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/ilasm/PortablePdb/ilasm.PortablePdb.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/ilasm/System/ilasm.System.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/ArrayMarshalling/Interop.ArrayMarshalling.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/COM/Interop.COM.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/DllImportAttribute/Interop.DllImportAttribute.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/ExecInDefAppDom/Interop.ExecInDefAppDom.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/ICastable/Interop.ICastable.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/ICustomMarshaler/Interop.ICustomMarshaler.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/IDynamicInterfaceCastable/Interop.IDynamicInterfaceCastable.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/LayoutClass/Interop.LayoutClass.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/MarshalAPI/Interop.MarshalAPI.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/NativeLibrary/Interop.NativeLibrary.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/PInvoke/Interop.PInvoke.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/StringMarshalling/Interop.StringMarshalling.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/StructMarshalling/Interop.StructMarshalling.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/StructPacking/Interop.StructPacking.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/UnmanagedCallersOnly/Interop.UnmanagedCallersOnly.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/CheckProjects/JIT.CheckProjects.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/CodeGenBringUpTests/JIT.CodeGenBringUpTests.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Directed/JIT.Directed.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Generics/JIT.Generics.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/HardwareIntrinsics/JIT.HardwareIntrinsics.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/IL_Conformance/JIT.IL_Conformance.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Intrinsics/JIT.Intrinsics.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/jit64/JIT.jit64.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Methodical/JIT.Methodical.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/opt/JIT.opt.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Performance/JIT.Performance.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Regression/JIT.Regression.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/RyuJIT/JIT.RyuJIT.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/SIMD/JIT.SIMD.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Stress/JIT.Stress.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/superpmi/JIT.superpmi.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/AssemblyDependencyResolver/Loader.AssemblyDependencyResolver.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/AssemblyLoadContext30Extensions/Loader.AssemblyLoadContext30Extensions.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/Loader.binding.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/classloader/Loader.classloader.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/CollectibleAssemblies/Loader.CollectibleAssemblies.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/ContextualReflection/Loader.ContextualReflection.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/regressions/Loader.regressions.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/elt/profiler.elt.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/eventpipe/profiler.eventpipe.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/gc/profiler.gc.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/rejit/profiler.rejit.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/transitions/profiler.transitions.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/unittest/profiler.unittest.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/readytorun/crossgen2/readytorun.crossgen2.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/readytorun/DynamicMethodGCStress/readytorun.DynamicMethodGCStress.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/readytorun/multifolder/readytorun.multifolder.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/readytorun/r2rdump/readytorun.r2rdump.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/readytorun/tests/readytorun.tests.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/reflection/DefaultInterfaceMethods/reflection.DefaultInterfaceMethods.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/reflection/Modifiers/reflection.Modifiers.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/reflection/SetValue/reflection.SetValue.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/reflection/StaticInterfaceMembers/reflection.StaticInterfaceMembers.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Regressions/coreclr/Regressions.coreclr.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventactivityidcontrol/tracing.eventactivityidcontrol.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventcounter/tracing.eventcounter.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventlistener/tracing.eventlistener.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventpipe/tracing.eventpipe.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventsource/tracing.eventsource.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/tracevalidation/tracing.tracevalidation.XUnitWrapper.dll -parallel collections -html /root/runtime/artifacts/log/TestRun.html -xml /root/runtime/artifacts/log/TestRun.xml  -notrait category=outerloop -notrait category=failing -nocolor" exited with code 1. [/root/runtime/src/tests/run.proj]

Build FAILED.

/root/runtime/src/tests/Common/tests.targets(74,5): error MSB3073: The command "/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root/corerun /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root/xunit/xunit.console.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/callconvs/baseservices.callconvs.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/compilerservices/baseservices.compilerservices.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/exceptions/baseservices.exceptions.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/mono/baseservices.mono.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/threading/baseservices.threading.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/TieredCompilation/baseservices.TieredCompilation.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/typeequivalence/baseservices.typeequivalence.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/baseservices/varargs/baseservices.varargs.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/CoreMangLib/system/CoreMangLib.system.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Exceptions/ForeignThread/Exceptions.ForeignThread.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/API/GC.API.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/Coverage/GC.Coverage.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/Features/GC.Features.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/LargeMemory/GC.LargeMemory.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/Regressions/GC.Regressions.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/Scenarios/GC.Scenarios.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/GC/Stress/GC.Stress.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/ilasm/PortablePdb/ilasm.PortablePdb.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/ilasm/System/ilasm.System.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/ArrayMarshalling/Interop.ArrayMarshalling.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/COM/Interop.COM.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/DllImportAttribute/Interop.DllImportAttribute.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/ExecInDefAppDom/Interop.ExecInDefAppDom.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/ICastable/Interop.ICastable.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/ICustomMarshaler/Interop.ICustomMarshaler.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/IDynamicInterfaceCastable/Interop.IDynamicInterfaceCastable.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/LayoutClass/Interop.LayoutClass.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/MarshalAPI/Interop.MarshalAPI.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/NativeLibrary/Interop.NativeLibrary.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/PInvoke/Interop.PInvoke.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/StringMarshalling/Interop.StringMarshalling.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/StructMarshalling/Interop.StructMarshalling.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/StructPacking/Interop.StructPacking.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Interop/UnmanagedCallersOnly/Interop.UnmanagedCallersOnly.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/CheckProjects/JIT.CheckProjects.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/CodeGenBringUpTests/JIT.CodeGenBringUpTests.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Directed/JIT.Directed.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Generics/JIT.Generics.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/HardwareIntrinsics/JIT.HardwareIntrinsics.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/IL_Conformance/JIT.IL_Conformance.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Intrinsics/JIT.Intrinsics.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/jit64/JIT.jit64.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Methodical/JIT.Methodical.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/opt/JIT.opt.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Performance/JIT.Performance.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Regression/JIT.Regression.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/RyuJIT/JIT.RyuJIT.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/SIMD/JIT.SIMD.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Stress/JIT.Stress.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/superpmi/JIT.superpmi.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/AssemblyDependencyResolver/Loader.AssemblyDependencyResolver.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/AssemblyLoadContext30Extensions/Loader.AssemblyLoadContext30Extensions.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/Loader.binding.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/classloader/Loader.classloader.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/CollectibleAssemblies/Loader.CollectibleAssemblies.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/ContextualReflection/Loader.ContextualReflection.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/regressions/Loader.regressions.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/elt/profiler.elt.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/eventpipe/profiler.eventpipe.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/gc/profiler.gc.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/rejit/profiler.rejit.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/transitions/profiler.transitions.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/profiler/unittest/profiler.unittest.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/readytorun/crossgen2/readytorun.crossgen2.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/readytorun/DynamicMethodGCStress/readytorun.DynamicMethodGCStress.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/readytorun/multifolder/readytorun.multifolder.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/readytorun/r2rdump/readytorun.r2rdump.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/readytorun/tests/readytorun.tests.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/reflection/DefaultInterfaceMethods/reflection.DefaultInterfaceMethods.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/reflection/Modifiers/reflection.Modifiers.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/reflection/SetValue/reflection.SetValue.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/reflection/StaticInterfaceMembers/reflection.StaticInterfaceMembers.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Regressions/coreclr/Regressions.coreclr.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventactivityidcontrol/tracing.eventactivityidcontrol.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventcounter/tracing.eventcounter.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventlistener/tracing.eventlistener.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventpipe/tracing.eventpipe.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventsource/tracing.eventsource.XUnitWrapper.dll /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/tracevalidation/tracing.tracevalidation.XUnitWrapper.dll -parallel collections -html /root/runtime/artifacts/log/TestRun.html -xml /root/runtime/artifacts/log/TestRun.xml  -notrait category=outerloop -notrait category=failing -nocolor" exited with code 1. [/root/runtime/src/tests/run.proj]
    0 Warning(s)
    1 Error(s)

Time Elapsed 04:06:14.45
Test run finished.
Parsing test results from (/root/runtime/artifacts/log/TestRunResults_Linux_arm64_Checked)
Analyzing /root/runtime/artifacts/log/testRun.xml
4 failed tests:

/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/tracing/BinderTracingTest.Basic/BinderTracingTest.Basic.sh (2 hours 0 minutes)
/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/tracing/BinderTracingTest.ResolutionFlow/BinderTracingTest.ResolutionFlow.sh (2 hours 0 minutes)
/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventactivityidcontrol/eventactivityidcontrol/eventactivityidcontrol.sh (1 minutes 58 seconds)
/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Regression/JitBlue/Runtime_45250/Runtime_45250/Runtime_45250.sh (12 seconds)

#################################################################
Output of failing tests:

[/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/tracing/BinderTracingTest.Basic/BinderTracingTest.Basic.sh]: 


cmdLine:/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/tracing/BinderTracingTest.Basic/BinderTracingTest.Basic.sh Timed Out (timeout in milliseconds: 7200000 from variable __TestTimeout, start: 1/7/2021 5:39:26 PM, end: 1/7/2021 7:39:26 PM)

Return code:      -100
Raw output file:      /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/Reports/Loader.binding/tracing/BinderTracingTest.Basic/BinderTracingTest.Basic.output.txt
Raw output:
BEGIN EXECUTION
/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root/corerun BinderTracingTest.Basic.dll ''
[5:39:31 PM] Running LoadFile...

cmdLine:/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/tracing/BinderTracingTest.Basic/BinderTracingTest.Basic.sh Timed Out (timeout in milliseconds: 7200000 from variable __TestTimeout, start: 1/7/2021 5:39:26 PM, end: 1/7/2021 7:39:26 PM)
Test Harness Exitcode is : -100
To run the test:
> set CORE_ROOT=/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root
> /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/tracing/BinderTracingTest.Basic/BinderTracingTest.Basic.sh


[/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/tracing/BinderTracingTest.ResolutionFlow/BinderTracingTest.ResolutionFlow.sh]: 


cmdLine:/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/tracing/BinderTracingTest.ResolutionFlow/BinderTracingTest.ResolutionFlow.sh Timed Out (timeout in milliseconds: 7200000 from variable __TestTimeout, start: 1/7/2021 5:39:26 PM, end: 1/7/2021 7:39:26 PM)

Return code:      -100
Raw output file:      /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/Reports/Loader.binding/tracing/BinderTracingTest.ResolutionFlow/BinderTracingTest.ResolutionFlow.output.txt
Raw output:
BEGIN EXECUTION
/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root/corerun BinderTracingTest.ResolutionFlow.dll ''
[5:39:35 PM] Running AssemblyLoadContextResolving_ReturnNull...

cmdLine:/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/tracing/BinderTracingTest.ResolutionFlow/BinderTracingTest.ResolutionFlow.sh Timed Out (timeout in milliseconds: 7200000 from variable __TestTimeout, start: 1/7/2021 5:39:26 PM, end: 1/7/2021 7:39:26 PM)
Test Harness Exitcode is : -100
To run the test:
> set CORE_ROOT=/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root
> /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Loader/binding/tracing/BinderTracingTest.ResolutionFlow/BinderTracingTest.ResolutionFlow.sh


[/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventactivityidcontrol/eventactivityidcontrol/eventactivityidcontrol.sh]: 


Return code:      1
Raw output file:      /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventactivityidcontrol/Reports/tracing.eventactivityidcontrol/eventactivityidcontrol/eventactivityidcontrol.output.txt
Raw output:
BEGIN EXECUTION
/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root/corerun eventactivityidcontrol.dll ''
System.Exception: Values for 'activityId' are not equal! Left='0000181c-0000-0000-0000-0000b9719d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000b1c-0000-0000-0000-0000b9649d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000e1c-0000-0000-0000-0000b9679d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='0000001a-0000-0000-0000-0000b7599d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00001b1c-0000-0000-0000-0000b9749d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000c1c-0000-0000-0000-0000b9659d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000018-0000-0000-0000-0000b5599d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000d1c-0000-0000-0000-0000b9669d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='0000171c-0000-0000-0000-0000b9709d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000019-0000-0000-0000-0000b6599d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000012-0000-0000-0000-0000cf599d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000014-0000-0000-0000-0000b1599d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00001d1c-0000-0000-0000-0000b9769d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000017-0000-0000-0000-0000b4599d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000f1c-0000-0000-0000-0000b9689d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000011-0000-0000-0000-0000ce599d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000016-0000-0000-0000-0000b3599d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000013-0000-0000-0000-0000b0599d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00001c1c-0000-0000-0000-0000b9759d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
System.Exception: Values for 'activityId' are not equal! Left='00000015-0000-0000-0000-0000b2599d59' Right='00000000-0000-0000-0000-000000000000'
   at Tracing.Tests.Common.Assert.Equal[T](String name, T left, T right) in /root/runtime/src/tests/tracing/common/Assert.cs:line 33
   at Tracing.Tests.EventActivityIdControlTest.TestThreadProc() in /root/runtime/src/tests/tracing/eventactivityidcontrol/EventActivityIdControl.cs:line 76
Expected: 100
Actual: 0
END EXECUTION - FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root
> /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/tracing/eventactivityidcontrol/eventactivityidcontrol/eventactivityidcontrol.sh


[/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Regression/JitBlue/Runtime_45250/Runtime_45250/Runtime_45250.sh]: 


Assert failure(PID 33332 [0x00008234], Thread: 33332 [0x8234]): Assertion failed '!"Instruction cannot be encoded: IF_DI_2A"' in 'Runtime_45250.Program:Run(FuncGetter)' during 'Generate code' (IL size 22)

    File: /root/runtime/src/coreclr/jit/emitarm64.cpp Line: 5607
    Image: /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root/corerun

/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Regression/JitBlue/Runtime_45250/Runtime_45250/Runtime_45250.sh: line 365: 33332 Aborted                 (core dumped) $LAUNCHER $ExePath "${CLRTestExecutionArguments[@]}"

Return code:      1
Raw output file:      /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Regression/Reports/JIT.Regression/JitBlue/Runtime_45250/Runtime_45250/Runtime_45250.output.txt
Raw output:
BEGIN EXECUTION
/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root/corerun Runtime_45250.dll ''
Expected: 100
Actual: 134
END EXECUTION - FAILED
Test Harness Exitcode is : 1
To run the test:
> set CORE_ROOT=/root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/Tests/Core_Root
> /root/runtime/artifacts/tests/coreclr/Linux.arm64.Checked/JIT/Regression/JitBlue/Runtime_45250/Runtime_45250/Runtime_45250.sh



#################################################################
End of output of failing tests
#################################################################


Total tests run    : 2807
Total passing tests: 2803
Total failed tests : 4
Total skipped tests: 0


Creating repro files at: /root/runtime/artifacts/repro/Linux.arm64.Checked
Traceback (most recent call last):
  File "/root/runtime/src/tests/../../src/tests/run.py", line 1643, in <module>
    sys.exit(main(args))
  File "/root/runtime/src/tests/../../src/tests/run.py", line 1633, in main
    create_repro(args, env, tests)
  File "/root/runtime/src/tests/../../src/tests/run.py", line 1603, in create_repro
    debug_env = DebugEnv(args, env, test)
  File "/root/runtime/src/tests/../../src/tests/run.py", line 192, in __init__
    self.__create_repro_wrapper__()
  File "/root/runtime/src/tests/../../src/tests/run.py", line 315, in __create_repro_wrapper__
    self.__create_bash_wrapper__()
  File "/root/runtime/src/tests/../../src/tests/run.py", line 390, in __create_bash_wrapper__
    """ % (self.unique_name, self.core_root)
AttributeError: 'DebugEnv' object has no attribute 'core_root'

@janvorli
Copy link
Member

janvorli commented Jan 8, 2021

Without the priority1 option, you've run just the priority 0 tests, which is just a fraction of all the 10000+ tests. As for the failures, these same tests keep failing on my local Ubuntu 16.04 repo too, so these are not indications of any RHEL 8 specific issue. The -100 exit code means timeout.
So I'd recommend trying to run the pri1 tests to get a better coverage. And running in docker container is also a good way to prove whether the issue you were seeing is in the kernel or in the repo shared libraries, since docker shares the same kernel.

@tmds
Copy link
Member Author

tmds commented Jan 11, 2021

@janvorli I ran pri1 tests. Can you see if there is something interesting in the results below?

Based on the summary table, these are the additional failures:

     JIT.Methodical.XUnitWrapper                          Total:  2089, Errors: 0, Failed:  1, Skipped: 0, Time:  1214.252s
     Loader.classloader.XUnitWrapper                      Total:  1994, Errors: 0, Failed:  6, Skipped: 0, Time:  1774.524s
     Regressions.coreclr.XUnitWrapper                     Total:    51, Errors: 0, Failed:  1, Skipped: 0, Time:  7200.132s

Full summary table:

  === TEST EXECUTION SUMMARY ===
     baseservices.callconvs.XUnitWrapper                  Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    16.008s
     baseservices.compilerservices.XUnitWrapper           Total:    10, Errors: 0, Failed:  0, Skipped: 0, Time:    32.752s
     baseservices.exceptions.XUnitWrapper                 Total:   135, Errors: 0, Failed:  0, Skipped: 0, Time:   230.857s
     baseservices.finalization.XUnitWrapper               Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     6.511s
     baseservices.mono.XUnitWrapper.dll                   Total:     0
     baseservices.multidimmarray.XUnitWrapper             Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     7.512s
     baseservices.threading.XUnitWrapper                  Total:   342, Errors: 0, Failed:  0, Skipped: 0, Time:   904.008s
     baseservices.TieredCompilation.XUnitWrapper          Total:    13, Errors: 0, Failed:  0, Skipped: 0, Time:    76.961s
     baseservices.typeequivalence.XUnitWrapper            Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     5.528s
     baseservices.varargs.XUnitWrapper                    Total:     2, Errors: 0, Failed:  0, Skipped: 0, Time:    11.242s
     CoreMangLib.system.XUnitWrapper                      Total:    62, Errors: 0, Failed:  0, Skipped: 0, Time:    60.271s
     Exceptions.ForeignThread.XUnitWrapper                Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     8.166s
     GC.API.XUnitWrapper                                  Total:    68, Errors: 0, Failed:  0, Skipped: 0, Time:   156.379s
     GC.Coverage.XUnitWrapper                             Total:     3, Errors: 0, Failed:  0, Skipped: 0, Time:   471.515s
     GC.Features.XUnitWrapper                             Total:    41, Errors: 0, Failed:  0, Skipped: 0, Time:     0.101s
     GC.LargeMemory.XUnitWrapper                          Total:     6, Errors: 0, Failed:  0, Skipped: 0, Time:     0.059s
     GC.Regressions.XUnitWrapper                          Total:    13, Errors: 0, Failed:  0, Skipped: 0, Time:     6.408s
     GC.Scenarios.XUnitWrapper                            Total:   477, Errors: 0, Failed:  0, Skipped: 0, Time:     0.747s
     GC.Stress.XUnitWrapper                               Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     0.058s
     hosting.stress.XUnitWrapper                          Total:     3, Errors: 0, Failed:  0, Skipped: 0, Time:     6.486s
     ilasm.PortablePdb.XUnitWrapper                       Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:   336.600s
     ilasm.System.XUnitWrapper                            Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    33.172s
     Interop.ArrayMarshalling.XUnitWrapper                Total:     2, Errors: 0, Failed:  0, Skipped: 0, Time:    23.841s
     Interop.BestFitMapping.XUnitWrapper                  Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    11.499s
     Interop.COM.XUnitWrapper                             Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     8.801s
     Interop.DllImportAttribute.XUnitWrapper              Total:     2, Errors: 0, Failed:  0, Skipped: 0, Time:    18.711s
     Interop.ExecInDefAppDom.XUnitWrapper.dll             Total:     0
     Interop.FuncPtrAsDelegateParam.XUnitWrapper          Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     6.508s
     Interop.ICastable.XUnitWrapper                       Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     7.513s
     Interop.ICustomMarshaler.XUnitWrapper                Total:     3, Errors: 0, Failed:  0, Skipped: 0, Time:   320.216s
     Interop.IDynamicInterfaceCastable.XUnitWrapper       Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    11.611s
     Interop.LayoutClass.XUnitWrapper                     Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     8.255s
     Interop.MarshalAPI.XUnitWrapper                      Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    11.342s
     Interop.NativeLibrary.XUnitWrapper                   Total:     4, Errors: 0, Failed:  0, Skipped: 0, Time:    11.724s
     Interop.PInvoke.XUnitWrapper                         Total:    29, Errors: 0, Failed:  0, Skipped: 0, Time:   182.667s
     Interop.PrimitiveMarshalling.XUnitWrapper            Total:     3, Errors: 0, Failed:  0, Skipped: 0, Time:     6.423s
     Interop.RefCharArray.XUnitWrapper                    Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     6.590s
     Interop.RefInt.XUnitWrapper                          Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    10.644s
     Interop.SimpleStruct.XUnitWrapper                    Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    11.519s
     Interop.StringMarshalling.XUnitWrapper               Total:     5, Errors: 0, Failed:  0, Skipped: 0, Time:    10.180s
     Interop.StructMarshalling.XUnitWrapper               Total:     4, Errors: 0, Failed:  0, Skipped: 0, Time:    16.083s
     Interop.StructPacking.XUnitWrapper                   Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    18.275s
     Interop.UnmanagedCallersOnly.XUnitWrapper            Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    16.125s
     JIT.BBT.XUnitWrapper                                 Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     9.783s
     JIT.CheckProjects.XUnitWrapper                       Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     0.057s
     JIT.CodeGenBringUpTests.XUnitWrapper                 Total:   640, Errors: 0, Failed:  0, Skipped: 0, Time:   236.693s
     JIT.Directed.XUnitWrapper                            Total:   607, Errors: 0, Failed:  0, Skipped: 0, Time:  1078.291s
     JIT.Generics.XUnitWrapper                            Total:   214, Errors: 0, Failed:  0, Skipped: 0, Time:   186.046s
     JIT.HardwareIntrinsics.XUnitWrapper                  Total:   353, Errors: 0, Failed:  0, Skipped: 0, Time:   858.436s
     JIT.IL_Conformance.XUnitWrapper                      Total:   404, Errors: 0, Failed:  0, Skipped: 0, Time:   137.509s
     JIT.Intrinsics.XUnitWrapper                          Total:    23, Errors: 0, Failed:  0, Skipped: 0, Time:    88.618s
     JIT.jit64.XUnitWrapper                               Total:   821, Errors: 0, Failed:  0, Skipped: 0, Time:   895.463s
     JIT.Methodical.XUnitWrapper                          Total:  2089, Errors: 0, Failed:  1, Skipped: 0, Time:  1214.252s
     JIT.opt.XUnitWrapper                                 Total:   182, Errors: 0, Failed:  0, Skipped: 0, Time:  1044.429s
     JIT.Performance.XUnitWrapper                         Total:    85, Errors: 0, Failed:  0, Skipped: 0, Time:   137.188s
     JIT.Regression.XUnitWrapper                          Total:  1461, Errors: 0, Failed:  1, Skipped: 0, Time:   625.658s
     JIT.RyuJIT.XUnitWrapper                              Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     6.736s
     JIT.SIMD.XUnitWrapper                                Total:   111, Errors: 0, Failed:  0, Skipped: 0, Time:    56.037s
     JIT.Stress.XUnitWrapper                              Total:     5, Errors: 0, Failed:  0, Skipped: 0, Time:     0.082s
     JIT.superpmi.XUnitWrapper                            Total:     3, Errors: 0, Failed:  0, Skipped: 0, Time:    53.433s
     Loader.AssemblyDependencyResolver.XUnitWrapper       Total:     2, Errors: 0, Failed:  0, Skipped: 0, Time:    46.376s
     Loader.AssemblyLoadContext30Extensions.XUnitWrapper  Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    11.140s
     Loader.binding.XUnitWrapper                          Total:    10, Errors: 0, Failed:  2, Skipped: 0, Time:  7200.349s
     Loader.classloader.XUnitWrapper                      Total:  1994, Errors: 0, Failed:  6, Skipped: 0, Time:  1774.524s
     Loader.CollectibleAssemblies.XUnitWrapper            Total:     2, Errors: 0, Failed:  0, Skipped: 0, Time:    15.765s
     Loader.ContextualReflection.XUnitWrapper             Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    32.679s
     Loader.lowlevel.XUnitWrapper                         Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     6.473s
     Loader.multimodule.XUnitWrapper                      Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    10.583s
     Loader.NativeLibs.XUnitWrapper                       Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    17.365s
     Loader.regressions.XUnitWrapper                      Total:     4, Errors: 0, Failed:  0, Skipped: 0, Time:     7.984s
     Loader.versioning.XUnitWrapper                       Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    11.171s
     profiler.elt.XUnitWrapper                            Total:     2, Errors: 0, Failed:  0, Skipped: 0, Time:    40.209s
     profiler.eventpipe.XUnitWrapper                      Total:     2, Errors: 0, Failed:  0, Skipped: 0, Time:     0.061s
     profiler.gc.XUnitWrapper                             Total:     2, Errors: 0, Failed:  0, Skipped: 0, Time:     0.060s
     profiler.rejit.XUnitWrapper                          Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     0.062s
     profiler.transitions.XUnitWrapper                    Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    32.755s
     profiler.unittest.XUnitWrapper                       Total:     3, Errors: 0, Failed:  0, Skipped: 0, Time:    57.478s
     readytorun.crossboundarylayout.XUnitWrapper          Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     7.108s
     readytorun.crossgen2.XUnitWrapper                    Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    40.862s
     readytorun.DynamicMethodGCStress.XUnitWrapper.dll    Total:     0
     readytorun.multifolder.XUnitWrapper                  Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     0.061s
     readytorun.r2rdump.XUnitWrapper                      Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     0.060s
     readytorun.tests.XUnitWrapper                        Total:     8, Errors: 0, Failed:  0, Skipped: 0, Time:    32.201s
     reflection.DefaultInterfaceMethods.XUnitWrapper      Total:     3, Errors: 0, Failed:  0, Skipped: 0, Time:    17.136s
     reflection.ldtoken.XUnitWrapper                      Total:     3, Errors: 0, Failed:  0, Skipped: 0, Time:    10.534s
     reflection.Modifiers.XUnitWrapper                    Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    10.831s
     reflection.regression.XUnitWrapper                   Total:     3, Errors: 0, Failed:  0, Skipped: 0, Time:    12.112s
     reflection.SetValue.XUnitWrapper                     Total:     2, Errors: 0, Failed:  0, Skipped: 0, Time:    29.089s
     reflection.StaticInterfaceMembers.XUnitWrapper       Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     5.613s
     reflection.Tier1Collectible.XUnitWrapper             Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    28.351s
     Regressions.coreclr.XUnitWrapper                     Total:    51, Errors: 0, Failed:  1, Skipped: 0, Time:  7200.132s
     tracing.eventactivityidcontrol.XUnitWrapper          Total:     1, Errors: 0, Failed:  1, Skipped: 0, Time:   118.236s
     tracing.eventcounter.XUnitWrapper                    Total:     5, Errors: 0, Failed:  0, Skipped: 0, Time:     0.064s
     tracing.eventlistener.XUnitWrapper                   Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:    18.237s
     tracing.eventpipe.XUnitWrapper                       Total:    13, Errors: 0, Failed:  0, Skipped: 0, Time:     0.068s
     tracing.eventsource.XUnitWrapper                     Total:     1, Errors: 0, Failed:  0, Skipped: 0, Time:     0.061s
     tracing.tracevalidation.XUnitWrapper                 Total:     3, Errors: 0, Failed:  0, Skipped: 0, Time:     0.068s
                                                                 -----          -          --           -        ----------
                                                    GRAND TOTAL: 10378          0          12           0        26514.436s (26549.053s)

This is the full log of the test run: pri1.log.

@janvorli
Copy link
Member

The failures with asserts (pMDReal != NULL) || !pCF->IsFrameless() are something we have not seen before. The issue with Assertion failed '!"Instruction cannot be encoded: IF_DI_2A"' in 'BigFrames.Test:Test1(int)' during 'Generate code' (IL size 23715) was recently hit during macOS arm64 bringup and it was caused by incorrect handling of OS page size in JIT in the stack probing code generation. See #42023. Does your RHEL 8 installation have page size different than 4kB?

@tmds
Copy link
Member Author

tmds commented Jan 11, 2021

Does your RHEL 8 installation have page size different than 4kB?

Yes, it has 64kB pages.
My Fedora arm64 machine, which doesn't give NullReferenceExceptions, has 4kB pages.

@tmds
Copy link
Member Author

tmds commented Jan 11, 2021

@janvorli I assume it is very likely this is the root cause for the NullReferenceExceptions?

Thank you for your help!

@janvorli
Copy link
Member

It could theoretically be the case. To be sure, you can experimentally rebuild the RHEL 8 kernel with page size set to 4kB, run RHEL 8 with it and see if it fixes the issue.

@tmds
Copy link
Member Author

tmds commented Mar 29, 2021

Maybe a custom section would work.

@janvorli looks like some linkers have a flag for this.

--rosegment
Put read-only non-executable sections in their own segment

@janvorli
Copy link
Member

@tmds thank you, that sounds great! I'll give it a try.

@tmds
Copy link
Member Author

tmds commented Apr 13, 2021

@janvorli have you looked at this issue further?

@janvorli
Copy link
Member

I have tried to use the --rosegment, but the default linker doesn't support that option. Only ld-gold and lld do. I will try to switch the linker to lld, I wanted to do that a long time ago anyways.

@crummel
Copy link
Contributor

crummel commented Apr 29, 2021

I installed lld on that ARM64 machine too, so it should be ready whenever you are.

@tmds
Copy link
Member Author

tmds commented May 3, 2021

@janvorli are you making some progress on this issue?

@janvorli
Copy link
Member

janvorli commented May 3, 2021

I am sorry for the delay. I have just tried to enable linking using lld and just that fixes the issue, as lld by default puts rodata into a non-text segment. I will send out a PR soon.

@sdmaclea
Copy link
Contributor

sdmaclea commented May 3, 2021

I have had no luck creating a small reproducer for this issue. @sdmaclea may have an idea based on the failures they see on the Apple Silicon.

The null reference exception on Apple Silicon were mostly resolved by Apple macOS fixes. They just went away when I updated my machine to macOS 11.3 Beta 6. There could be a few still lingering, but I haven't identified them yet. Our CI machines were just updated to macOS 11.3. I am in the process of reenabling the previously failing tests. I'll have a better idea if there are any other lingering issues when those are reenabled and we run for a few days/weeks.

Your observation that it is likely kernel related seems believable....

@tmds
Copy link
Member Author

tmds commented May 4, 2021

Your observation that it is likely kernel related seems believable....

@sdmaclea maybe @janvorli figures something out when he takes a look.
I can ask some kernel engineers to look at the issue, but they will definitely want to have a small reproducer. Do you have an idea what this could be?
We probably don't know what changed in Apple macOS that got rid of the null reference exceptions?

@sdmaclea
Copy link
Contributor

sdmaclea commented May 4, 2021

Do you have an idea what this could be?

I was guessing

  • a memory barrier/ordering issues
  • incomplete register set preservation in thread preemption or signal handling/return

Basically same opinion as @janvorli ("It seems it might be related to something with capturing / restoring context around GC suspension, the FlushProcessWriteBuffers not working or something of that kind.")

We probably don't know what changed in Apple macOS that got rid of the null reference exceptions?

No.

It also looks like it might have only been a one of many issues. The Apple Silicon CI macOS upgrade improved pass rate, but I still see these null reference exceptions in CI (but not on my local machine). I am going through the differences to see if I can get CI to match my local experience.

@tmds
Copy link
Member Author

tmds commented May 11, 2021

@janvorli @sdmaclea Once the linker issue is fixed, I think the next step is to find a smaller reproducer.

@janvorli
Copy link
Member

@tmds I have created a PR with a fix last week, but cannot merge it in yet as I need to update our build docker images to have lld linker. I've hit some issues with those changes that I didn't have a chance to fix yet. But if you want, you can try my PR locally: #52244.

@tmds
Copy link
Member Author

tmds commented May 11, 2021

@janvorli without thinking much about it I asked our CI to build your branch. The build doesn't work because the SDK that gets downloaded to perform the build still has the rodata in the wrong segment and crashes. Next week, I'll try to build libcoreclr separately and patch the build SDK.

I'm puzzled why other arm64 distros don't have an issue with the s_gsCookie. I think they should run into the same issue (executable gets removed from .text).

@janvorli
Copy link
Member

@tmds I believe the issue doesn't occur if you have 4kB large memory pages, only when the distro has larger pages, the block with the cookie "leaks" into code.

@mangod9
Copy link
Member

mangod9 commented Jul 6, 2021

@tmds @janvorli is any fix required here for .net 6?

@mangod9 mangod9 added this to the 6.0.0 milestone Jul 6, 2021
@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2021
@tmds
Copy link
Member Author

tmds commented Jul 6, 2021

is any fix required here for .net 6?

Yes.

The cookie issue still causes our builds to fail from the start.
Once that is fixed, and it has rippled into the SDK that gets used from the .dotnet folder, I suspect we'll see the NullReferenceExceptions again.

Our plan is to build .NET 6 for arm64, but this issue needs to be resolved for that.

I've looked at the problem but I couldn't figure out the root cause. I think it is in the kernel.
I can ask kernel engineers to have a look, but they'll want al better reproducer.

@crummel
Copy link
Contributor

crummel commented Jul 8, 2021

@janvorli Now that preview6 is wrapping up, any idea on when you'll be able to take another look at this?

@janvorli
Copy link
Member

janvorli commented Jul 9, 2021

I have created a PR in arcade to fix rootfs build for Alpine 3.9. After consulting it with @mthalman, I am going to get in my original change to the docker images and keep building for Alpine on 3.9 for now and move to using the 3.13 after the preview 7. Then I can get in my change to use the lld linker and start looking into the null reference issues. We still have null reference issues on Apple Silicon, so chances are they are related.

@janvorli
Copy link
Member

janvorli commented Aug 3, 2021

@omajid, @tmds I have tried to run all coreclr pri 1 tests on RHEL 8 with 64kB page size using the latest main and no tests were failing with NullReferenceException anymore.
I had to run the tests manually (enumerating all of the related .sh files and running them with added -coreroot argument), since the Preview 6 SDK / runtime that's normally used to execute xunit doesn't have the fix for the GS cookie mapping issue that I've fixed recently by switching to the lld linker.
Out of all the coreclr pri 1 tests, 10052 succeeded, 29 failed and 3 timed out. 15 of the failures are Unhandled exception. System.InvalidProgramException: Vararg calling convention not supported., few were caused by the testing methodology (some tests can properly run only via xunit) and the remaining failures are of unknown kind (but no crashes, just error codes meaning the test didn't pass as expected).
So I am closing this issue.

@janvorli janvorli closed this as completed Aug 3, 2021
@omajid
Copy link
Member

omajid commented Aug 4, 2021

Thanks, @janvorli ! Any idea when a fix might land such that building runtime works out of the box? Maybe in a month or so?

@tmds
Copy link
Member Author

tmds commented Aug 4, 2021

I have tried to run all coreclr pri 1 tests on RHEL 8 with 64kB page size using the latest main and no tests were failing with NullReferenceException anymore.

I'm not sure you're running tests in a way that shows the NullReferenceException issue is fixed.
When I ran these tests before none throwed NullReferenceException (#43349 (comment)). The exceptions happend as part of running the library tests.

The NullReferenceExceptions were happening before we started hitting the GSCookie issue. It's clear #52244 fixes the GSCookie issue (#43349 (comment)), but I don't understand how it fixes the NullReferenceExceptions.

@janvorli
Copy link
Member

janvorli commented Aug 4, 2021

I believe the NullReferenceException was fixed by another change, #53510. That was what was causing those on macOS arm64 and it was not Apple specific.

@janvorli
Copy link
Member

janvorli commented Aug 4, 2021

Any idea when a fix might land such that building runtime works out of the box?

The fix will be part of RC1, which will come after preview 7.

@tmds
Copy link
Member Author

tmds commented Aug 4, 2021

I believe the NullReferenceException was fixed by another change, #53510. That was what was causing those on macOS arm64 and it was not Apple specific.

Great! Thank you for the reference.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-PAL-coreclr tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Projects
None yet
Development

No branches or pull requests

9 participants