Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] Port managed Monitor implementation from NativeAOT #48058

Open
CoffeeFlux opened this issue Feb 9, 2021 · 4 comments
Open

[mono] Port managed Monitor implementation from NativeAOT #48058

CoffeeFlux opened this issue Feb 9, 2021 · 4 comments

Comments

@CoffeeFlux
Copy link
Contributor

This is another case where we could potentially see runtime size reductions, but it's a bit more complicated than some other Threading classes due to runtime integration.

Difficulties:

  1. Monitor.Enter/TryEnter/Exit are all JIT intrinsics with a fast path and fast slowpath via icall and a slow slowpath via another icall
  2. We don't materialize a lock for every object instance - only the ones where someone tries to create a monitor for them. So there's System.Object layout stuff
  3. There will probably be a perf hit if we don't inline the managed methods. Possibly a memory hit on Android.

Also if it will end up using ConditionalWeakTable under the hood, we will need to make sure that's not using monitors for synchronization. And probably we'll need to see if our Ephemerons need perf work.

It would probably be best to start with the interpreter and AOT and put the managed impl behind a feature flag.

@CoffeeFlux CoffeeFlux added this to the 6.0.0 milestone Feb 9, 2021
@ghost
Copy link

ghost commented Feb 9, 2021

Tagging subscribers to this area: @CoffeeFlux
See info in area-owners.md if you want to be subscribed.

Issue Details

This is another case where we could potentially see runtime size reductions, but it's a bit more complicated than some other Threading classes due to runtime integration.

Difficulties:

  1. Monitor.Enter/TryEnter/Exit are all JIT intrinsics with a fast path and fast slowpath via icall and a slow slowpath via another icall
  2. We don't materialize a lock for every object instance - only the ones where someone tries to create a monitor for them. So there's System.Object layout stuff
  3. There will probably be a perf hit if we don't inline the managed methods. Possibly a memory hit on Android.

Also if it will end up using ConditionalWeakTable under the hood, we will need to make sure that's not using monitors for synchronization. And probably we'll need to see if our Ephemerons need perf work.

It would probably be best to start with the interpreter and AOT and put the managed impl behind a feature flag.

Author: CoffeeFlux
Assignees: -
Labels:

area-VM-meta-mono

Milestone: 6.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 9, 2021
@CoffeeFlux CoffeeFlux removed the untriaged New issue has not been triaged by the area owner label Feb 9, 2021
@CoffeeFlux CoffeeFlux changed the title [mono] Port managed Monitor implementation to wasm from NativeAOT [mono] Port managed Monitor implementation from NativeAOT Feb 16, 2021
@SamMonoRT SamMonoRT modified the milestones: 6.0.0, 7.0.0 Jun 16, 2021
@lambdageek
Copy link
Member

The NativeAOT Monitor design is pretty different - using a managed SyncTable that assigns Lock object to objects (and a CWT of managed Condition variables in case of contention).

But I don't think there's anything fundamentally problematic with using it for Mono.


But if for some reason we can't do it, or the perf is not reasonable, I wonder if another good step might be to rewrite the mono intrinsics in managed (ie: something like ObjectHeader to describe the lockword, and a C# struct of the inflated lock structure an a big pile of Unsafe calls to manipulate the bits. When there is lock contention we would call down to C (ie continue using the MonoCoopCond under the hood, for now).

@SamMonoRT SamMonoRT modified the milestones: 7.0.0, 8.0.0 Aug 4, 2022
lambdageek added a commit to lambdageek/runtime that referenced this issue Jan 31, 2023
Investigate one incremental alternative from dotnet#48058

Rather than switching to NativeAOT's Monitor implementation in one
step, first get access to Mono's synchronization block from managed,
and implement the fast paths for hash code and monitor code in C#.
lambdageek added a commit to lambdageek/runtime that referenced this issue Feb 10, 2023
Investigate one incremental alternative from dotnet#48058

Rather than switching to NativeAOT's Monitor implementation in one
step, first get access to Mono's synchronization block from managed,
and implement the fast paths for hash code and monitor code in C#.
@lambdageek
Copy link
Member

As a sort of baby steps version of this issue, I implemented #81380 which just reimplements some of mono's existing monitor implementation in managed code. Specifically I made the mono object header (sync block and vtable) accessible to C# code and implemented the fast paths of monitor enter/exit and hash codes in C#.

The main lesson learned is that Mono's execution engines are very sensitive to how the C# code is implemented and what is handed off to native intrinsics. Specifically:

  • the JIT and AOT benefit a great deal from having more code in C# - even with no-wrapper icalls,
  • for the interpreter the fastest option is intrinsics that encapsulate an entire operation; second fastest is AggressiveInlining C# code. Calls to native via qcalls and calls to managed are slow
  • The jiterpreter primarily has the same behavior as the interpreter except it's even more important that fast path call contains no managed calls (ie everything is inlined).

As a result, I think if we implement monitors completely in managed, we have to be mindful to keep the fast paths fast. If we need to kick out to C for anything (for example allocation or native event creation), it's likely to kill performance on the interpreter.

lambdageek added a commit that referenced this issue Feb 15, 2023
* [mono] Implement synch block fast paths in managed

   Investigate one incremental alternative from #48058

   Rather than switching to NativeAOT's Monitor implementation in one step, first get access to Mono's synchronization block from managed, and implement the fast paths for hash code and monitor code in C#.

* Implement TryEnterFast and TryEnterInflatedFast

   This should help the interpreter particularly, since it doesn't have an intrinsic path here

* Add replacements for test_owner and test_synchronized icalls

   ObjectHeader.IsEntered and ObjectHeader.HasOwner, respectively

* access sync block through helpers

* Get a pointer to the object header another way

   Unsafe.As<TFrom,TTo>() was converting a object** into a header*.

   Go through Unsafe.AsPointer to get a Header** and then deref it once to get a header

* logical shifts, not arithmetic

   on wasm32, IntPtr and int are both the same size. so when we store a hash code into _lock_word that ends up with the 30th high bit set, the right shift to get it back out needs to be arithmetic or else we get incorrect hash codes

* force inlining

* move fast path into ReliableEnterTimeout

* Don't pass lockTaken to ObjectHolder.TryEnterFast

   set it in the caller if we succeeded

* TryEnterFast: handle flat recursive locking, too

* Implement Monitor.Exit fast path in managed

   If the LockWord is flat, there has been no contention, so there is noone to signal when we exit

* Don't call MonoResolveUnmanagedDll for the default ALC

   it would always return null anyway.

   But also this avoids a circular dependency between the managed Default ALC constructor and Monitor icalls (the ALC base constructor locks allContexts).

* [interp] don't lookup internal calls that were replaced by an opcode

   If it was replaced by an intrinsic, we're not really going to call the wrapper, no need to look for it

* Add Interlocked.CompareExchange interp intrinsics for I4 and I8

* more inlining: LockWordCompareExchange

* Use a ref struct to pass around the address of a MonoObject* on the stack

   This is similar to ObjectHandleOnStack except with a getter that lets us view the object header.

   Get rid of calls to GC.KeepAlive.

   With this, the fast path (locking a flat unowned object) doesn't have any calls on it besides argument validation

* Add fast path for Monitor.Exit for inflated monitors with no waiters

* Check for obj == null before ThrowIfNull

   In the interp it's cheaper to do a null check followed by a call than an unconditional call

* Inline RuntimeHelpers.GetHashCode

* inline TryEnterInflatedFast; nano-opts

   - change SyncBlock.HashCode to return the value, not a ref - we don't have managed code that writes into the sync block.
   - change TryEnterInflatedFast to avoid looping - if the CompareExchange fails, fall back to the slow path

* disable the TryGetHashCode fast path helper

   It was actually making things slower on the interpreter.

   In the JIT it made object.GetHashCode about 50% faster for an object with a pre-computed hash, but that might mean we need the same kind of "no-wrapper; then wrapper" fastpath/slowpath that we have for the Monitor.Enter intrinsic in mini.

* Fix last exit from inflated monitor

   When nest == 1 the caller is responsible for setting the owner to 0 to indicate that the monitor is unlocked.  (But we leave the nest count == 1, so that it is correct next time the monitor is entered)

* avoid repeated calls in TryEnterInflatedFast

* Combine TryExit and IsEntered into TryExitChecked

   avoid duplicated lockword and sync block manipulations

* Re-enable managed GetHashCode in JIT; intrinsify in interp

   Instead of treating InternalGetHashCode/InternalTryGetHashCode as intrinsics in the interpreter, intrinsify RuntimeHelpers.GetHashCode and RuntimeHelpers.TryGetHashCode and avoid the managed code entirely when possible

* add CMPXCHG opcodes to the jiterpreter

   For the i64 version, pass the address of the i64 values to the helper function.

   The issue is that since JS doesn't have i64, EMSCRIPTEN_KEEPALIVE messes with the function's signature and we get import errors in the JITed wasm module.  Passing by address works around the issue since addresses are i32.

* fix whitespace

* [jiterp] Allow null obj in hashcode intrinsics

   The underlying C functions are null-friendly, and the managed intrinsics are allowed to C null inputs.  The assert is unnecessary.
@lambdageek lambdageek modified the milestones: 8.0.0, 9.0.0 Jul 11, 2023
@SamMonoRT
Copy link
Member

cc @fanyang-mono

@steveisok steveisok modified the milestones: 9.0.0, Future Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants