Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory copies #93

Open
aboba opened this issue Sep 1, 2020 · 5 comments
Open

Memory copies #93

aboba opened this issue Sep 1, 2020 · 5 comments
Labels
Discussion topic Topic discussed at the workshop Opportunities and Challenges Opportunities and Challenges of Browser-Based Machine Learning Web Platform Foundations Web Platform Foundations for Machine Learning

Comments

@aboba
Copy link
Collaborator

aboba commented Sep 1, 2020

In the presentation on "Machine Learning and Web Media" , reference is made to the need for efficiency. Today machine learning applications operating within the browser media pipeline will trigger many additional memory copies compared with native applications due to the following considerations:

  1. Although QUIC is implemented in user space and there are zero copy implementations, currently browser implementations copy memory when moving data between C++ and Javascript. Copies are not eliminated by use of BYOB readers/writers in WHATWG streams (e.g. data is not written or read directly from the provided buffers).

  2. Handoffs between JS and WebAssembly also result in a memory copy.

  3. Another memory copy may occur when using a Transferrable Stream.

  4. More memory copies can occur when rendering video if zero-copy rasterizer flags are not enabled.

@anssiko anssiko added Opportunities and Challenges Opportunities and Challenges of Browser-Based Machine Learning Web Platform Foundations Web Platform Foundations for Machine Learning labels Sep 3, 2020
@wchao1115
Copy link

In the Interactive ML talk from Tero, he mentioned that in one of the high framerate scenarios they're working where (generating music from the camera feed of video frames), the cost of moving the data around is as high as the cost of ML inference itself. This is very true when a lot of copies and CPU/GPU uploads/downloads occur, especially around webcam media and video frames where the data must be read first into the canvas, and then copy out and upload. And this is only in a good case. The worst case could involve multiple roundtrips. It would be interesting to explore if there could be a more direct way a video frame from the camera can be fed more directly to ML without too many intermediaries.

@anssiko
Copy link
Member

anssiko commented Sep 8, 2020

Cross-referencing #97 that discusses the overall Action-Response Cycle and its various bottlenecks as outlined in @teropa's talk. My understanding is (unnecessary, if we had better APIs) memory copies cause bottlenecks in this cycle when pixels are drawn to a canvas from which the pixel tensor is built from. I think with better API integration we could get rid of that bottleneck and avoid crossing the CPU/GPU boundary unnecessarily.

"Media integration" e.g. fast streaming inputs from MediaStream is proposed as one possible solution by @teropa which I think is a similar idea to @wchao1115's "a more direct way a video frame from the camera can be fed more directly to ML without too many intermediaries".

@chun137
Copy link

chun137 commented Sep 16, 2020

In the past year, we did some POC of Real-Time video processing based on WebRTC and machine leaning, including video super resolution.

As videos in most web engines are processed with hardware and video frames are stored as gpu textures, they can be processed with WebGL. The WebGL API gl.texSubImage2D(... , <video element>) will reuse video textures in browser internally by zero-copy. But we found that most web ML frameworks, including tfjs, only recieves inputs and provides outputs with ArrayBuffer, although they have WebGL backend.

To improve the performance, we forked tfjs and added an new input interface to it, which creates texture tensor with <video> element directly, and also another output interface to expose the internal texture result to web app, so that web app can send

If web ML frameworks suport WebGL texture inputs directly and could output texture predict result to app, we can avoid CPU <-> GPU memory copy with js, GPU pipeline would not be interrupted. The performance of realtime video processing would be better.

@anssiko anssiko added this to the 2020-09-16 Live Session #1 milestone Sep 17, 2020
@tidoust
Copy link
Member

tidoust commented Sep 23, 2020

As mentioned during live session #1, Dom an I propose to continue discussions during a virtual breakout session at TPAC. I proposed a session on Memory copies & zero-copy operations on the Web on the dedicated Wiki page and will reach out to hopefully interested parties. Goals are to explore needs to copy memory in various Web technologies (JS, WebAssembly, WebGPU, Machine Learning, WebRTC, Media) and identify possible architectural updates to the Web Platform that could help reduce unneeded memory copies.

@erights
Copy link

erights commented Dec 23, 2024

Would proposed Immutable ArrayBuffers help? They are at Stage 2 of the tc39 process and support zero-copy sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion topic Topic discussed at the workshop Opportunities and Challenges Opportunities and Challenges of Browser-Based Machine Learning Web Platform Foundations Web Platform Foundations for Machine Learning
Projects
None yet
Development

No branches or pull requests

7 participants