-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[browser] WASM deputy worker - multi-threading proposal #91696
Conversation
Tagging subscribers to 'arch-wasm': @lewing Issue DetailsThis is proposal on how to enable threads on WASM, while allowing for blocking Feedback and questions are welcome. Contributes to #85592
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice writeup. I'm a little bit worried the model with the deputy thread will be difficult to explain.
- it will actually execute small chunks of C# code | ||
- the pieces which are in the generated code of the JSExport | ||
- containing dispatch to deputy worker's synchronization context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this dispatch need to be done in C#? why can't it be done in JS?
More generally I don't understand why running emscripten off the main thread is a non a workable solution. It has a much more easily understandable execution model, in my opinion - it's just like talking to a server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JS interop is creating JS objects and that needs to happen on the right JS worker, because of affinity.
The JS side of the marshaller is calling static C# methods, to create C# instances of Task
.
To the second question, that's alternative 10. The main reason is that JS interop and also the JS embedding APIs actually needs to make those short-lived synchronous calls to Mono/C#. Otherwise even JS code of marshaling individual arguments would have to be async, because it would send messages to a "server".
The Blazor renderBatch
is also touching memory and converting MonoString*
etc. That could not become async, perf would just die.
It could be done, but it would not be less trouble than having C# on the UI thread internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is more developed version of this "why not aternative 10." lower in the conversation. I will create separate PR/proposal for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #91731
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally I don't understand why running emscripten off the main thread is a non a workable solution.
@lambdageek here is more detailed answer #91731 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short answer: because marshaling JSImport/JSExport parameters requires lot on Mono infrastructure. Like allocating MonoString
, TaskCompletionSource
and many others. Those are mostly synchronous calls and need to be fast. Trying to dispatch those calls from UI to sidecar thread would lead to terrible latency, per parameter. Alternatively we could have double proxy for some of those data types, which is not great either, mostly for GC.
- also the errors would not propagate | ||
- TODO: is there anything special about error propagation over the interop/thread boundary ? | ||
|
||
## Deputy worker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a well-known "JSWebWorker with JSInterop" thread that the ui-thread JS interop knows about? If not, are the different capabilities of this special thread and other JSWebWorker with JS Interop threads required?
Wait, I read more and I thikn I can answer myself. It's just like a JSWebWorker with JSInterop thread except instead of owning "real" JS Interop it owns proxies that represent JS Interop objects of the UI thread instead. Is that right?
So we basically have special "JS event loop" threads and they come in two flavors: 1. they interact with their own js objects, or 2. they just pass messages to some other delegated js object realm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, deputy is similar to JSWebWorker
except it talks to UI thread JS space, not it's own (on logical level).
The implementation of how it works is not JS event loop, but rather C# SynchronizationContext
.
Passing those messages could be also done via emscripten's emscripten_dispatch_to_thread_async
or by JS postMessage
.
Doing it in C# allows us to interleave it with the code generated by Roslyn codegen.
Let's do more thinking about this, how to actually do it.
One of the open questions for me is selection of target thread based on affinity of the JSObject
proxy passed as argument. Note we are talking about static methods which could have more or also none such object.
This question is relevant for "call HTTP from any C# thread" scenario.
- we could avoid it by generating late bound ICall. Very ugly. | ||
- hide `SynchronizationContext.Send` and `SynchronizationContext.Post` inside of the generated code. | ||
- needs to be also inside generated nested marshalers | ||
- is solution for deputy's SynchronizationContext same as for JSWebWorker's SynchronizationContext, from the code-gen perspective ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the answer to be "yes"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope so too. Over night I realized that shape of this API could be 2 methods on the JSHost
rather than exposing the SynchronizationContext
, because there could more complex logic on which of the should be used.
- how could "HTTP from any C# thread" redirect this to the thread of fetch JS object affinity ? | ||
- should generated code or the SynchronizationContext detect it from passed arguments ? | ||
- TODO: explain why not make user responsible for doing it, instaed of changing generator | ||
- TODO: figure out backward compatibility of already generated code. Must work on single threaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems hard
## Blazor - what breaks when MT build | ||
- as compared to single threaded runtime, the major difference would be no synchronous callbacks. | ||
- for example from DOM `onClick`. This is one of the reasons people prefer ST WASM over Blazor Server. | ||
- but there is really [no way around it](#problem), because you can't have both MT and sync calls from UI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could allow it but tell users not to use it. This is similar ot guidance in other UI framworks not to block UI threads. They don't prohibit you from running code but then you're on the hook for your UI freezing up (or in this case draining the battery and getting killed by the mobile browser)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have MSBuild property for it, which would be opt-in. Right now, I think if we are doing all of this complex stuff to prevent it, we should gain the benefits of it. Otherwise we will have to support crazy things.
I though that we could get away with spin block. But MT unit tests on CI are hanging in too many cases, and probable for this reason. We just need to get over it, me included.
Yes, but hopefully the users don't need to know that the C# is running elsewhere and the only visible difference would be sync calls from JS and bit of interop perf. This "deputy thread" concept is mostly for this small audience. |
I think you've done a fantastic job coming up with a solution for this problem, but I think it might be best to not solve this problem. While reading over the design I kept thinking "this has a lot in common with blazor server, maybe we can reuse all its remoting and etc for this scenario". The way I see it this proposes a sort of hybrid model between existing blazor WASM and existing blazor server:
I think there's a lot of value in having this hybrid option - it reduces latency vs blazor server, and reduces the resources spent on the server and reduces bandwidth spent communicating between client and server. That's great. But do we actually want to have three models? Will users comprehend all the nuances of the 3 models and know which one to pick for their scenario? Can we communicate that clearly to them? Will users feel betrayed if they commit to model 2 and then discover they actually needed model 1, or vice versa, and it's hard to swap after the fact? Adding a third distinct model with its own approach to interop feels like it adds a lot of education and maintenance burden since we can't say 'everything that applies to Blazor Server applies to this' or 'everything that applies to Blazor WASM applies to this'. Users may have selected Blazor WASM specifically for the ability to seamlessly communicate between C# and JS, which goes away in this new model. I think at the end of the day a lot of code makes assumptions that don't hold for the browser, and the existence of the deputy worker will protect that code from some of its assumptions but the code is still fundamentally broken for this environment. It assumes things that aren't true. It would be ideal if we could make all existing C# concurrency constructs and code work in the browser as-is but I'm not sure if the costs we pay for doing so are justified - the existence of the deputy worker and all the inter-thread coordination will have a measurable cost (hopefully small), startup and shutdown get more complex, debugging gets WAY more complex, etc. This could be especially bad on mobile, where increased resource usage = increased odds of a "tab crash" For the specific scenario of "some code needs to block while waiting for a lock to be released":
You've clearly already thought through the problem scenarios here and all the nuances we have to tackle, like remoting errors across thread boundaries, the thread affinity of JS objects, etc. I'm confident that if we go forward with this design we'll eventually end up with something really good. My concern is ultimately that I don't know how many end users would actually be happy after adopting it, because the code they're trying to run is ill-suited for the web environment. I think our time and resources might be better spent on continued efforts to try and move users away from synchronous blocking models for their web apps, and perhaps improved tooling to ease that transition. On the other hand, maybe the deputy worker model gets users 90% of the way there, and they can fix the other 10% of cases and ship their app and not have to think about it ever again, and their users will be satisfied even if they're unhappy with how the app misbehaves while the deputy worker is hung. That's a real possibility and maybe it's good enough. At the end of the day though I'm not sure I can convince myself that
I'm also mildly concerned that any illusion we manage to maintain in this regard will eventually break. Browser vendors already made the arbitrary decision to ban timers, what's stopping them from deciding to terminate any web worker that runs for too long without yielding to the event loop? or to flip it around: browser vendors got away with this, so why can't we tell people infinite-timeout locks are banned? 😊 In conclusion (this comment has gotten away from me) I think maybe the right decision is instead to gently force developers to take their medicine, and make the medicine taste as good as possible with a set of really good debugging and development tools that build on what we've already got. |
- alloc/free memory | ||
- transform IL -> IR and update Mono VM shared state | ||
- we will spin lock before Blazor `renderBatch` | ||
- to wait for "pause GC" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if we can't pause the GC because the deputy worker is stuck? I guess the browser thread hangs, right?
- we will spin lock before Blazor `renderBatch` | ||
- to wait for "pause GC" | ||
- we will spin lock during GC, if we hit barrier | ||
- TODO: is that short enough ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCs can take hundreds of MS in the real world (I'm not sure why, but our GC under WASM seems to be much slower than it would be on native) so we should assume that users on lower spec devices will see hangs if we have to do this. It might still be the right choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other alternative is to rewrite renderBatch
to streaming bytes.
- has SynchronizationContext installed on it | ||
- So that C# calls could be dispatched to it by runtime | ||
- throw PNSE on attempt to marshal sync C# delegate to UI JavaScript | ||
- can run C# finalizers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finalizers already run on a separate thread, not the main thread, AFAIK. the finalizer thread is owned by the GC, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't run finalizers in signe-threaded WASM right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in single-threaded wasm finalizers run in mono_runtime_do_background_work
which is scheduled as a background job when mono_wasm_gc_finalize_notify
is called. In other words after a GC is finished we queue up an idle task to run finalizers.
- So that C# calls could be dispatched to it by runtime | ||
- throw PNSE on attempt to marshal sync C# delegate to UI JavaScript | ||
- can run C# finalizers | ||
- will run GC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we somehow guarantee that GC will never happen on the browser thread? The browser thread will potentially have to stall and wait for GC at the very least, if we ever allow C# to run on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's GC.TryStartNoGCRegion()
(no-op on Mono, I think). It's a weird global mode (so if you have background threads that are quickly allocating, your critical region can still run out of memory), but it might be good enough for some scenarios.
## JSWebWorker with JS interop | ||
- is C# thread created and disposed by new API for it | ||
- could block on synchronization primitives | ||
- there is JSSynchronizationContext installed on it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI since you mentioned performance, be aware that (at present) having any kind of custom synchronization context installed causes some of the Task/Await machinery to go through a slow-path. see #69409 (comment) (though I hope this won't actually matter)
- the `SetResult` need to be marshaled on thread of the Promise. | ||
- The proxy of the Promise knows which `SynchronizationContext` to dispatch to. | ||
- on UI thread it's the UI thread's SynchronizationContext, not deputy's | ||
- TODO: could same task be marshaled to multiple JS workers ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should assume it can and will happen
This is interesting angle. Could we do "blazor server" in the MT (and emscripten started on the worker as in alternative 10), while the UI would just run the same light weight "blazor server"" signalR like protocol via JS only? It would make the user mental model easier. There would be no non-blazor JS interop and no This will behave similar to "blazor server" but it would work offline and low latency. That could solve blazor, but not non-blazor MT is that acceptable outcome ? I will do separate proposal for that, so the we don't confuse discussion here. |
Yes, chrome dev tools experience may be different, because it's will be async with the C# part.
I disagree, I think C# desktop ecosystem is full of locks and MT. |
You're right that this is a difficult problem, but it's one we've already solved in an even more challenging environment (Blazor Server doing these calls over the internet). We already track old, incoming, and new values separately so we can pick the right final state based on what actually changed, and we have an event dispatch system that takes account of the fact that the UI shape can change in between event handlers being exposed and event calls arriving at the server. There are some edge cases where developers run into issues with overlapping state changes, but it's rare in practice because we've learned how common UX patterns interact with network latency and what developers usually want to happen. Latency across threads in the same browser will be less prone to these difficult cases than arbitrary latency across the internet. |
FWIW this is basically the design that VS Code wasm-wasi settled on, too, for different reasons (plugin isolation). |
Co-authored-by: Austin Wise <[email protected]>
Excuse me if I'll be off the point here, but just wanted to share my experience in a similar (as I see it) matter. I'm building a web (PWA) app with C# "backend" compiled to WASM and react frontend communicating in sync/blocking fashion. I've started with .NET 6 where I had to modify the .NET runtime a bit to make it work with the bundlers. I've also added some quality of life stuff on top of the interop, like auto-generating bindings based on C# interfaces, TS types generation, events, etc. It all worked great in the end, DX was basically like the C# part was another JS library consumed on the frontend. Testing and debuging were straightforward. At some point I had an idea of moving the C# "backend" to a web worker. I thought this would ultimately dedicate browser's UI thread to the UI-only stuff, like rendering and animation and improve responsiveness (even though there were no real issues with that at the moment). To minimize breaking changes in the APIs (I've already had a chunky codebase at that point), I've used comlink (which uses JS proxies under the hood) to re-route the interop calls to the webworker. Basically, all the existing APIs were intact, except that they all become async, and blocking calls from JS to C# were no longer possible. In the end, however, I've rolled all that back to blocking. Main reason was debug experience. And it's not about connecting a debugger or anything, but just understanding the flow control. Complex UI frameworks (like React) have lots going on under the hood and when coupled with an async behaviour on the other "side" it just becomes a real pain to work with. The sweet DX of consuming C# API as a JS library goes away and you have to think about it as a remote server. Another point is performance/responsiveness. While the UI was not blocking anymore (though it was a hardly noticable improvement in itself), total time spend on interop became noticeably larger, especially in tight back-and-forth loops. Imho, running the entire C# module on webwoker is not worth it, at least in the case of apps with common "point-click" interfaces (it may be different for something realtime, like games, though in this case input latency will suffer). However, in case of computation-heavy tasks, it sure make sense to off-load them on worker threads. I hoped that maybe it could be possible to add C# API to spawn worker threads just for that. No interop or any sort of communication with the JS side, just pure C# code running on the webworker. Is this at all possible w/o moving the whole runtime to worker as well? One use case for something like this I had in mind is running a language server; ie, our main C# "backend" is a visual editor, which has an optional text editing mode. The text editor communicate with a language server written in C#. Instead of building standalone .NET WASM module just for the server, it would be nice to spawn it from the main C# "backend" to a worker thread. |
Thanks for chiming in, much appreciated! I move the response to side-car design, because your feedback is very relevant there. |
There could be variation on this design, when
Benefits
Especially compared to sidecar option D) Downsides
|
Should this rather go to dotnet/designs repo? |
Closing in favor of dotnet/designs#301 |
This is proposal on how to enable threads on WASM, while allowing for blocking
.Wait
,lock() { ... }
and similar from user C# code.Feedback and questions are welcome.
Contributes to #85592
There is alternative proposal #91731