-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono] Port managed Monitor implementation from NativeAOT #48058
Comments
Tagging subscribers to this area: @CoffeeFlux Issue DetailsThis is another case where we could potentially see runtime size reductions, but it's a bit more complicated than some other Threading classes due to runtime integration. Difficulties:
Also if it will end up using ConditionalWeakTable under the hood, we will need to make sure that's not using monitors for synchronization. And probably we'll need to see if our Ephemerons need perf work. It would probably be best to start with the interpreter and AOT and put the managed impl behind a feature flag.
|
The NativeAOT Monitor design is pretty different - using a managed SyncTable that assigns Lock object to objects (and a CWT of managed Condition variables in case of contention). But I don't think there's anything fundamentally problematic with using it for Mono. But if for some reason we can't do it, or the perf is not reasonable, I wonder if another good step might be to rewrite the mono intrinsics in managed (ie: something like ObjectHeader to describe the lockword, and a C# struct of the inflated lock structure an a big pile of Unsafe calls to manipulate the bits. When there is lock contention we would call down to C (ie continue using the MonoCoopCond under the hood, for now). |
Investigate one incremental alternative from dotnet#48058 Rather than switching to NativeAOT's Monitor implementation in one step, first get access to Mono's synchronization block from managed, and implement the fast paths for hash code and monitor code in C#.
Investigate one incremental alternative from dotnet#48058 Rather than switching to NativeAOT's Monitor implementation in one step, first get access to Mono's synchronization block from managed, and implement the fast paths for hash code and monitor code in C#.
As a sort of baby steps version of this issue, I implemented #81380 which just reimplements some of mono's existing monitor implementation in managed code. Specifically I made the mono object header (sync block and vtable) accessible to C# code and implemented the fast paths of monitor enter/exit and hash codes in C#. The main lesson learned is that Mono's execution engines are very sensitive to how the C# code is implemented and what is handed off to native intrinsics. Specifically:
As a result, I think if we implement monitors completely in managed, we have to be mindful to keep the fast paths fast. If we need to kick out to C for anything (for example allocation or native event creation), it's likely to kill performance on the interpreter. |
* [mono] Implement synch block fast paths in managed Investigate one incremental alternative from #48058 Rather than switching to NativeAOT's Monitor implementation in one step, first get access to Mono's synchronization block from managed, and implement the fast paths for hash code and monitor code in C#. * Implement TryEnterFast and TryEnterInflatedFast This should help the interpreter particularly, since it doesn't have an intrinsic path here * Add replacements for test_owner and test_synchronized icalls ObjectHeader.IsEntered and ObjectHeader.HasOwner, respectively * access sync block through helpers * Get a pointer to the object header another way Unsafe.As<TFrom,TTo>() was converting a object** into a header*. Go through Unsafe.AsPointer to get a Header** and then deref it once to get a header * logical shifts, not arithmetic on wasm32, IntPtr and int are both the same size. so when we store a hash code into _lock_word that ends up with the 30th high bit set, the right shift to get it back out needs to be arithmetic or else we get incorrect hash codes * force inlining * move fast path into ReliableEnterTimeout * Don't pass lockTaken to ObjectHolder.TryEnterFast set it in the caller if we succeeded * TryEnterFast: handle flat recursive locking, too * Implement Monitor.Exit fast path in managed If the LockWord is flat, there has been no contention, so there is noone to signal when we exit * Don't call MonoResolveUnmanagedDll for the default ALC it would always return null anyway. But also this avoids a circular dependency between the managed Default ALC constructor and Monitor icalls (the ALC base constructor locks allContexts). * [interp] don't lookup internal calls that were replaced by an opcode If it was replaced by an intrinsic, we're not really going to call the wrapper, no need to look for it * Add Interlocked.CompareExchange interp intrinsics for I4 and I8 * more inlining: LockWordCompareExchange * Use a ref struct to pass around the address of a MonoObject* on the stack This is similar to ObjectHandleOnStack except with a getter that lets us view the object header. Get rid of calls to GC.KeepAlive. With this, the fast path (locking a flat unowned object) doesn't have any calls on it besides argument validation * Add fast path for Monitor.Exit for inflated monitors with no waiters * Check for obj == null before ThrowIfNull In the interp it's cheaper to do a null check followed by a call than an unconditional call * Inline RuntimeHelpers.GetHashCode * inline TryEnterInflatedFast; nano-opts - change SyncBlock.HashCode to return the value, not a ref - we don't have managed code that writes into the sync block. - change TryEnterInflatedFast to avoid looping - if the CompareExchange fails, fall back to the slow path * disable the TryGetHashCode fast path helper It was actually making things slower on the interpreter. In the JIT it made object.GetHashCode about 50% faster for an object with a pre-computed hash, but that might mean we need the same kind of "no-wrapper; then wrapper" fastpath/slowpath that we have for the Monitor.Enter intrinsic in mini. * Fix last exit from inflated monitor When nest == 1 the caller is responsible for setting the owner to 0 to indicate that the monitor is unlocked. (But we leave the nest count == 1, so that it is correct next time the monitor is entered) * avoid repeated calls in TryEnterInflatedFast * Combine TryExit and IsEntered into TryExitChecked avoid duplicated lockword and sync block manipulations * Re-enable managed GetHashCode in JIT; intrinsify in interp Instead of treating InternalGetHashCode/InternalTryGetHashCode as intrinsics in the interpreter, intrinsify RuntimeHelpers.GetHashCode and RuntimeHelpers.TryGetHashCode and avoid the managed code entirely when possible * add CMPXCHG opcodes to the jiterpreter For the i64 version, pass the address of the i64 values to the helper function. The issue is that since JS doesn't have i64, EMSCRIPTEN_KEEPALIVE messes with the function's signature and we get import errors in the JITed wasm module. Passing by address works around the issue since addresses are i32. * fix whitespace * [jiterp] Allow null obj in hashcode intrinsics The underlying C functions are null-friendly, and the managed intrinsics are allowed to C null inputs. The assert is unnecessary.
This is another case where we could potentially see runtime size reductions, but it's a bit more complicated than some other Threading classes due to runtime integration.
Difficulties:
Also if it will end up using ConditionalWeakTable under the hood, we will need to make sure that's not using monitors for synchronization. And probably we'll need to see if our Ephemerons need perf work.
It would probably be best to start with the interpreter and AOT and put the managed impl behind a feature flag.
The text was updated successfully, but these errors were encountered: