-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standalone pass recording #438
Comments
Well, late night programming fires back. And also, the previous plan is crushed by reality... If we do the look-up on every operation, that means we are still going to be paying some (but less) overhead, even with software pass recording. Alternatively, the recorded passes would have no tracking information, and that would need to be rebuilt by the implementation. |
Progress with #440 produced one more insight: there is no point in making passes structures open: -
So what this is looking to be like: we'll have opaque pass builders exposed by |
439: Refactor usage tracking to be truly sparse r=dependency a=kvark ~~This is a required step towards #438 . We want to be tracking the usage by only having resource IDs around, not resources themselves. So we aren't going to have access to `full_selector`.~~ It's also streamlining some of the verbose parts of the internal API for usage tracking. It also uses `SmallVec` for the texture tracker ranges, essentially making it lightweight for most cases (where the layer count is 1). Compromises: - removes `SEPARATE_DEPTH_STENCIL_STATES` support. It's not in the near future to enable it anyway. Co-authored-by: Dzmitry Malyshau <[email protected]>
Just as #440 was getting ready to be merged, I found an issue with the original design. If the client-side recording doesn't do anything with the IDs and just passes them through, it can't guarantee that by the time the recording is finished all of the related objects are still alive. This is actually a bigger issue than with just the code, I see it to be a problem with the spec. Suppose the user does @Kangz this leads me to believe the current API is tailored towards Wire-style processing, where there is one and exactly one communication channel between client and the server, i.e. everything is serialized. Seems like a major constraint to parallelize both sides in the future (Web working recording a command buffer -> thread on the GPU proces side). Am I overthinking this? |
@grovesNL here is what I'm thinking so far. We should be able to proceed with the code in #440 if we enforce the guarantee [1] by the users of wgpu-core:
This means that we are still going to be super fast when going through Also, cc @jdashg in case you have strong opinion about ^ and time to give feedback. [1] Any resource used by a pass recording stays alive at least till the end of the pass |
Would we have to resolve this issue in wgpu-native to implement webgpu-headers too? |
@grovesNL Good point! The answer depends on how we specify The discussion needs to take place with all the stakeholders. So far, it doesn't appear immediately useful to me to have this ability of dropping something that you are using in a pass. Most often, today at least, you'd create temporary buffers to upload data, which means you'd be doing transfer operations. It's not expected to happen in the pass. |
Thought about this some more. Having the references be hold on the JS side seems most reasonable to me, since it's supposedly easier than reaching out to The ability (possible, not yet tested) to guarantee this at the type level of The problem with |
update: this is quite easy to bake into wgpu-rs 🎉 , gfx-rs/wgpu-rs#155 is now updated |
Changes to Possible ways to proceed:
|
The recent WebGL client/host split for OOP does GC/CC reference tracking precisely on the client side, fwiw. I'm in favor of that solution. |
In Chromium / Dawn we're looking at serializing each command separately with the (id, generation) of objects. Imagine we are doing If you want to instead have a meta IPC command like |
Currently, each and every operation on a command encoder or a pass goes directly into the driver. This was done with the intention of having minimal overhead and maximum simplicity. However, I don't think the effort is paying off: each pass operation still has to switch on the backend type and (more importantly) lock at least one storage (e.g. for the pass).
This approach is not feasible for powering WebGPU implementation in Gecko, where
wgpu
leaves in a separate process, and thus direct low-overhead access on each command is not an option.From the early days, the plan was to have "software" passes recorded on the client side and then actually provided to
wgpu
on a command-by-command basis on the server (after crossing the IPC). The idea was that we'd figure out all the usage across the pass, so that we can optionally pass that when we start the pass, tellingwgpu
to not bother inserting transitions and instead just validate that the given usages are correct.The problems here are many, as it turns out now:
wgpu-core
for pass recording is a major complicationFortunately, I believe these are solvable. My suggestion is to move to software passes consistently everywhere, not just for Gecko implementation. This would give us the following advantages:
wgpu
command buffer (unlike today, where each pass ends up being a native command buffer, and we insert more for stitching...)The text was updated successfully, but these errors were encountered: