-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop the support of synchronous execution #531
Comments
Big +1 to this proposal from the Chrome team. Thank you for the detailed exploration! |
Thank you @huningxin, please feel free to proceed with a PR. For context, this issue was discussed in https://www.w3.org/2024/01/25-webmachinelearning-minutes.html#t05 PR #532 awaits the landing of the PR that addresses this issue. |
@huningxin Is there a detailed benchmark result that can be shared? I think 103% / 95% on average is a good news to hear, but I wonder whether the performance delta is consistent across all models, or whether there's a certain grouping (i.e. the distribution of the numbers). |
I added the details at onnxruntime/pull/19145. The updated result have more models. The average async / sync on webnn-cpu becomes 93.45% while webnn-gpu is still 103.84%. The newly-added models has ops fallback on webnn-cpu that causes the cpu number decreased a bit. |
Remove the definition and algorithm steps for - ML.createContextSync() - MLGraphBuilder.buildSync() - MLContext.computeSync() Fix webmachinelearning#531
Remove the definition and algorithm steps for - ML.createContextSync() - MLGraphBuilder.buildSync() - MLContext.computeSync() Fix webmachinelearning#531
* Remove the definition and algorithm steps for - ML.createContextSync() - MLGraphBuilder.buildSync() - MLContext.computeSync() * Use [=reject=] |promise| with a {{TypeError}} * Abort after rejecting promise in parallel steps Fix #531
The current WebNN spec supports both asynchronous and synchronous execution modes. In particular, the synchronous execution mode, including
MLContext.computeSync()
,ML.createContextSync()
andMLGraphBuilder.buildSync()
methods, were introduced (only available in dedicated worker) for easy integration with ML frameworks written in C++ and compiled to Wasm, for example ONNXRuntime WebNN EP (Execution Provider) used the sync execution before.Chromium WebNN prototype supports both execution modes for implementation feedback. Chrome team encouraged WG to check whether sync APIs are really necessary before its launch.
Recently, ONNXRuntim WebNN EP experimented (onnxruntime#19145) the async execution mode and compared the performance with sync. For sync execution, ONNXRuntime runs WebNN EP in a dedicated worker and calls WebNN
computeSync()
method there, the JavaScript user code in main thread communicate (viapostMessage
) with WebNN EP in worker thread through ONNXRuntime Wasm proxy. For async execution, ONNXRuntime runs WebNN EP in main thread and calls WebNN asynccompute()
method through asyncify.According to the test result across 35 models (including CNNs & transformers), the model inference time difference of the two execution modes is minimum. Actually, for GPU, the async is even slightly faster than sync (async / sync 103% in average). While for CPU, the async is a bit slower than sync (async / sync 95% in average). It's because WebNN EP on CPU has less operators support currently (referring to implementation status). For each non-supported op, the model inference will fallback to run Wasm op and return to compute next WebNN sub-graph. The more ops fallback, the more async compute call (more asyncify overhead).
With more ops being supported by WebNN CPU/XNNPACK backend, the ops fallback would be less that means less asyncify overhead. And with JSPI (JavaScript Promise Integration) coming, the asyncify overhead hopefully would become even less. The performance of async execution mode is expected to be faster.
With onnxruntime#19145 merged, ONNXRuntime WebNN EP is now only using WebNN async execution mode and won't use sync execution mode anymore.
Based on this implementation experience, the proposal for WebNN spec is to remove the support of sync execution. That would help simplify the spec as well as the implementation. Wasm ML framework could use WebNN async methods via asyncify today and migrate to JSPI once it is available.
The text was updated successfully, but these errors were encountered: