[mono] Port managed Monitor implementation from NativeAOT #48058

CoffeeFlux · 2021-02-09T15:55:13Z

This is another case where we could potentially see runtime size reductions, but it's a bit more complicated than some other Threading classes due to runtime integration.

Difficulties:

Monitor.Enter/TryEnter/Exit are all JIT intrinsics with a fast path and fast slowpath via icall and a slow slowpath via another icall
We don't materialize a lock for every object instance - only the ones where someone tries to create a monitor for them. So there's System.Object layout stuff
There will probably be a perf hit if we don't inline the managed methods. Possibly a memory hit on Android.

Also if it will end up using ConditionalWeakTable under the hood, we will need to make sure that's not using monitors for synchronization. And probably we'll need to see if our Ephemerons need perf work.

It would probably be best to start with the interpreter and AOT and put the managed impl behind a feature flag.

ghost · 2021-02-09T15:55:16Z

Tagging subscribers to this area: @CoffeeFlux
See info in area-owners.md if you want to be subscribed.

Issue Details

This is another case where we could potentially see runtime size reductions, but it's a bit more complicated than some other Threading classes due to runtime integration.

Difficulties:

Monitor.Enter/TryEnter/Exit are all JIT intrinsics with a fast path and fast slowpath via icall and a slow slowpath via another icall
We don't materialize a lock for every object instance - only the ones where someone tries to create a monitor for them. So there's System.Object layout stuff
There will probably be a perf hit if we don't inline the managed methods. Possibly a memory hit on Android.

Also if it will end up using ConditionalWeakTable under the hood, we will need to make sure that's not using monitors for synchronization. And probably we'll need to see if our Ephemerons need perf work.

It would probably be best to start with the interpreter and AOT and put the managed impl behind a feature flag.

Author:	CoffeeFlux
Assignees:	-
Labels:	`area-VM-meta-mono`
Milestone:	6.0.0

lambdageek · 2021-09-23T20:43:15Z

The NativeAOT Monitor design is pretty different - using a managed SyncTable that assigns Lock object to objects (and a CWT of managed Condition variables in case of contention).

But I don't think there's anything fundamentally problematic with using it for Mono.

But if for some reason we can't do it, or the perf is not reasonable, I wonder if another good step might be to rewrite the mono intrinsics in managed (ie: something like ObjectHeader to describe the lockword, and a C# struct of the inflated lock structure an a big pile of Unsafe calls to manipulate the bits. When there is lock contention we would call down to C (ie continue using the MonoCoopCond under the hood, for now).

Investigate one incremental alternative from dotnet#48058 Rather than switching to NativeAOT's Monitor implementation in one step, first get access to Mono's synchronization block from managed, and implement the fast paths for hash code and monitor code in C#.

lambdageek · 2023-02-14T16:56:43Z

As a sort of baby steps version of this issue, I implemented #81380 which just reimplements some of mono's existing monitor implementation in managed code. Specifically I made the mono object header (sync block and vtable) accessible to C# code and implemented the fast paths of monitor enter/exit and hash codes in C#.

The main lesson learned is that Mono's execution engines are very sensitive to how the C# code is implemented and what is handed off to native intrinsics. Specifically:

the JIT and AOT benefit a great deal from having more code in C# - even with no-wrapper icalls,
for the interpreter the fastest option is intrinsics that encapsulate an entire operation; second fastest is AggressiveInlining C# code. Calls to native via qcalls and calls to managed are slow
The jiterpreter primarily has the same behavior as the interpreter except it's even more important that fast path call contains no managed calls (ie everything is inlined).

As a result, I think if we implement monitors completely in managed, we have to be mindful to keep the fast paths fast. If we need to kick out to C for anything (for example allocation or native event creation), it's likely to kill performance on the interpreter.

* [mono] Implement synch block fast paths in managed Investigate one incremental alternative from #48058 Rather than switching to NativeAOT's Monitor implementation in one step, first get access to Mono's synchronization block from managed, and implement the fast paths for hash code and monitor code in C#. * Implement TryEnterFast and TryEnterInflatedFast This should help the interpreter particularly, since it doesn't have an intrinsic path here * Add replacements for test_owner and test_synchronized icalls ObjectHeader.IsEntered and ObjectHeader.HasOwner, respectively * access sync block through helpers * Get a pointer to the object header another way Unsafe.As<TFrom,TTo>() was converting a object** into a header*. Go through Unsafe.AsPointer to get a Header** and then deref it once to get a header * logical shifts, not arithmetic on wasm32, IntPtr and int are both the same size. so when we store a hash code into _lock_word that ends up with the 30th high bit set, the right shift to get it back out needs to be arithmetic or else we get incorrect hash codes * force inlining * move fast path into ReliableEnterTimeout * Don't pass lockTaken to ObjectHolder.TryEnterFast set it in the caller if we succeeded * TryEnterFast: handle flat recursive locking, too * Implement Monitor.Exit fast path in managed If the LockWord is flat, there has been no contention, so there is noone to signal when we exit * Don't call MonoResolveUnmanagedDll for the default ALC it would always return null anyway. But also this avoids a circular dependency between the managed Default ALC constructor and Monitor icalls (the ALC base constructor locks allContexts). * [interp] don't lookup internal calls that were replaced by an opcode If it was replaced by an intrinsic, we're not really going to call the wrapper, no need to look for it * Add Interlocked.CompareExchange interp intrinsics for I4 and I8 * more inlining: LockWordCompareExchange * Use a ref struct to pass around the address of a MonoObject* on the stack This is similar to ObjectHandleOnStack except with a getter that lets us view the object header. Get rid of calls to GC.KeepAlive. With this, the fast path (locking a flat unowned object) doesn't have any calls on it besides argument validation * Add fast path for Monitor.Exit for inflated monitors with no waiters * Check for obj == null before ThrowIfNull In the interp it's cheaper to do a null check followed by a call than an unconditional call * Inline RuntimeHelpers.GetHashCode * inline TryEnterInflatedFast; nano-opts - change SyncBlock.HashCode to return the value, not a ref - we don't have managed code that writes into the sync block. - change TryEnterInflatedFast to avoid looping - if the CompareExchange fails, fall back to the slow path * disable the TryGetHashCode fast path helper It was actually making things slower on the interpreter. In the JIT it made object.GetHashCode about 50% faster for an object with a pre-computed hash, but that might mean we need the same kind of "no-wrapper; then wrapper" fastpath/slowpath that we have for the Monitor.Enter intrinsic in mini. * Fix last exit from inflated monitor When nest == 1 the caller is responsible for setting the owner to 0 to indicate that the monitor is unlocked. (But we leave the nest count == 1, so that it is correct next time the monitor is entered) * avoid repeated calls in TryEnterInflatedFast * Combine TryExit and IsEntered into TryExitChecked avoid duplicated lockword and sync block manipulations * Re-enable managed GetHashCode in JIT; intrinsify in interp Instead of treating InternalGetHashCode/InternalTryGetHashCode as intrinsics in the interpreter, intrinsify RuntimeHelpers.GetHashCode and RuntimeHelpers.TryGetHashCode and avoid the managed code entirely when possible * add CMPXCHG opcodes to the jiterpreter For the i64 version, pass the address of the i64 values to the helper function. The issue is that since JS doesn't have i64, EMSCRIPTEN_KEEPALIVE messes with the function's signature and we get import errors in the JITed wasm module. Passing by address works around the issue since addresses are i32. * fix whitespace * [jiterp] Allow null obj in hashcode intrinsics The underlying C functions are null-friendly, and the managed intrinsics are allowed to C null inputs. The assert is unnecessary.

SamMonoRT · 2023-07-25T16:18:50Z

cc @fanyang-mono

CoffeeFlux added the area-VM-meta-mono label Feb 9, 2021

CoffeeFlux added this to the 6.0.0 milestone Feb 9, 2021

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 9, 2021

CoffeeFlux removed the untriaged New issue has not been triaged by the area owner label Feb 9, 2021

CoffeeFlux changed the title ~~[mono] Port managed Monitor implementation to wasm from NativeAOT~~ [mono] Port managed Monitor implementation from NativeAOT Feb 16, 2021

CoffeeFlux mentioned this issue Feb 16, 2021

MonoVM runtime size-reduction for WebAssembly #43778

Open

23 tasks

SamMonoRT modified the milestones: 6.0.0, 7.0.0 Jun 16, 2021

SamMonoRT modified the milestones: 7.0.0, 8.0.0 Aug 4, 2022

lambdageek mentioned this issue Aug 9, 2022

[Mono] Task.Wait does not respect SynchronizationContext.IsWaitNotificationRequired under certain circumstances, leading to deadlocks #71855

Open

lambdageek mentioned this issue Jan 30, 2023

[mono] Implement synch block fast paths in managed #81380

Merged

lambdageek mentioned this issue Jun 19, 2023

Mono: Monitor.Enter() is not interrupted by Thread.Interrupt() #87718

Open

lambdageek modified the milestones: 8.0.0, 9.0.0 Jul 11, 2023

lewing assigned artl93 Feb 9, 2024

steveisok unassigned artl93 Jun 28, 2024

steveisok modified the milestones: 9.0.0, Future Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono] Port managed Monitor implementation from NativeAOT #48058

[mono] Port managed Monitor implementation from NativeAOT #48058

CoffeeFlux commented Feb 9, 2021

ghost commented Feb 9, 2021

lambdageek commented Sep 23, 2021

lambdageek commented Feb 14, 2023

SamMonoRT commented Jul 25, 2023

[mono] Port managed Monitor implementation from NativeAOT #48058

[mono] Port managed Monitor implementation from NativeAOT #48058

Comments

CoffeeFlux commented Feb 9, 2021

ghost commented Feb 9, 2021

lambdageek commented Sep 23, 2021

lambdageek commented Feb 14, 2023

SamMonoRT commented Jul 25, 2023