Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stand-alone passes #440

Merged
merged 9 commits into from
Jan 14, 2020
Merged

Stand-alone passes #440

merged 9 commits into from
Jan 14, 2020

Conversation

kvark
Copy link
Member

@kvark kvark commented Jan 7, 2020

Fixes #438

@kvark kvark requested a review from grovesNL January 7, 2020 22:33
@kvark kvark changed the title Stand-alone compute passes Stand-alone passes Jan 8, 2020
@kvark kvark changed the title Stand-alone passes [WIP] Stand-alone passes Jan 8, 2020
@kvark kvark mentioned this pull request Jan 8, 2020
@kvark
Copy link
Member Author

kvark commented Jan 8, 2020

Judging purely by the number of lines, you can see how there is something crazy going on here. The last commits integrates peek-poke for serializing the command stream of a (compute) pass.

The main driver for this (as opposed to an enum Command that has everything) is taking care of commands that have variable input count, such as set_vertex_buffers and set_bind_group. By serializing the commands, we can just append the extra data to the byte stream. This may also result in smaller data size for the recorded pass, and improved data cache locality for decoding it (less gaps).

The cost of this is more unsafe code, which is unfortunate. Note: WebRender also uses peek-poke, so it's not completely crazy for us either.

Another interesting bit is RawPass, which is a repr(C) structure (works somewhat well with raw nature of peek-poke). The idea is that Gecko C++ code would use it directly on the stack, while only wgpu-native would have to box it or put into a storage (to hide behind an immutable ID). According to the proposal in gfx-rs/wgpu-rs#156, wgpu-rs should also be able to place it directly, avoiding indirection on every access to the pass.

@kvark
Copy link
Member Author

kvark commented Jan 10, 2020

I'm seeing something that looks like a compiler bug when running the compute example (e.g. with make run-example-compute) from rev 5ae0623.

This is the offending piece:

    WGPURawComputePassId command_pass = //temp name
        wgpu_command_encoder_begin_compute_pass(encoder, NULL);

What I see is that wgpu_command_encoder_begin_compute_pass returns a proper 64 bit pointer. But it gets truncated to 32bits (see movslq instruction in the assembly screenshot) before assigning to command_pass (which is also 64 bit):

Screen Shot 2020-01-10 at 12 18 25

@jrmuizel any ideas what I'd be doing wrong here? I might be too quick to call it a compiler bug :)
Edit: this was tested on macOS (screenshot is from XCode) so far
Edit2: turns out there was a mashup between the header and the module, hence the broken assembly.

@kvark
Copy link
Member Author

kvark commented Jan 11, 2020

Hitting this in cbindgen (cc @emilio):

thread 'main' panicked at 'Unable to mangle generic parameter ConstPtr(Path(GenericPath { path: Path { name: "TextureViewId" }, export_name: "TextureViewId", generics: [], ctype: None })) for 'RenderPassColorAttachmentDescriptorBase'', /Users/dmalyshau/.cargo/registry/src/github.com-1ecc6299db9ec823/cbindgen-0.12.1/src/bindgen/mangle.rs:52:17

Will probably be able to work around, but still annoying...

@kvark
Copy link
Member Author

kvark commented Jan 11, 2020

Alright, at this stage raw compute and render passes work for the examples of this repo 🎉

@emilio
Copy link
Contributor

emilio commented Jan 11, 2020

If you have a test-case for that I'd be happy to take a look.

@kvark kvark changed the title [WIP] Stand-alone passes Stand-alone passes Jan 12, 2020
@kvark
Copy link
Member Author

kvark commented Jan 12, 2020

Tested on wgpu-rs examples (see gfx-rs/wgpu-rs#155), no regressions.
@grovesNL this is ready for merging, if you have time to review it.

I'm going to miss the immediate pass recording, but at this point it appears to me that having a single consistent code path for native and non-native is a bigger win. We can re-explore this direction later.

@kvark
Copy link
Member Author

kvark commented Jan 12, 2020

Interestingly, this change can also be seen as a first step towards de-storaging some of the objects. I.e. going from id of Storage<T> to just Box<T>. This makes sense for objects that are few and/or temporary, such as passes, command buffers, devices, but not for resources.

@kvark kvark changed the title Stand-alone passes [WIP] Stand-alone passes Jan 12, 2020
@kvark
Copy link
Member Author

kvark commented Jan 12, 2020

Found an issue - since we are soft recording with this PR, we aren't keeping the objects alive after they are used in the recording.

@kvark kvark changed the title [WIP] Stand-alone passes Stand-alone passes Jan 14, 2020
uint32_t start_slot,
const WGPUBufferId *buffers,
const WGPUBufferId *buffer_ids,
const WGPUBufferAddress *offsets,
uintptr_t length);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly relevant to this PR but many of these uintptr_t are uint64_t in webgpu-headers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to #452

trackers,
}
}
#[derive(Clone, Copy, Debug, PeekCopy, Poke)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be repr(C)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These don't need to be repr(C) because we never lay them out in memory that C code works with.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought peek_poke expects to operate on repr(C) types to have certain guarantees about the data layout, or at least the peek_poke examples seem to do that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I'll double check!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The webrender examples uses repr(C) because it was written for Gecko C++ code to see the types and copied verbatim into peek-poke.

That is to say, I consulted with @Gankra and it turns out there can be a slightly bigger chance of the compiler optimization (turning the struct poking into a memcopy) if it's repr(C). Still not a guarantee, still not a correctness concern, just a bit more likely.

So we should make these types repr(C) as you suggest, at least the Rect thing.

}

#[derive(Clone, Copy, Debug, PeekCopy, Poke)]
enum RenderCommand {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be repr(C)?

}

#[derive(Clone, Copy, Debug, PeekCopy, Poke)]
pub struct Rect<T> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be repr(C)?

index: u8,
num_dynamic_offsets: u8,
bind_group_id: id::BindGroupId,
phantom_offsets: PhantomSlice<BufferAddress>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it a bit confusing to add PhantomSlice to the end of these variants because I wasn't sure if they were markers for peek_poke to know to treat them specially somehow.

Instead of using this marker type, could we replace it by comments to indicate that there is some dynamically sized array at the end of these variants and have decode as a free function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having at least some type-level semantic of the data following commands is helpful. having the decode strictly associated with a PhantomSlice makes it harder to fail when writing and maintaining this code. Comments wouldn't be as helpful.

Copy link
Collaborator

@grovesNL grovesNL Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one interesting (future) option would be for peek_poke to provide a marker type to declare the size of variable length arrays, i.e.

#[derive(PeekCopy, Poke)]
enum RenderCommand {
    SetBindGroup {
        index: u8,
        num_dynamic_offsets: u8,
        #[peek_poke(array_size = "num_dynamic_offsets")]
        dynamic_offsets: peek_poke::VariableLengthArray<BufferAddress>,
    },
    /* ... */
}

@kvark
Copy link
Member Author

kvark commented Jan 14, 2020

@grovesNL thank you for having a look! I think we can proceed, no major concerns here.
bors r=me,grovesNL

bors bot added a commit that referenced this pull request Jan 14, 2020
440: Stand-alone passes r=me,grovesNL a=kvark

Fixes #438 

Co-authored-by: Dzmitry Malyshau <[email protected]>
@bors
Copy link
Contributor

bors bot commented Jan 14, 2020

Build succeeded

@bors bors bot merged commit a510197 into gfx-rs:master Jan 14, 2020
@kvark kvark deleted the client-pass branch August 24, 2021 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Standalone pass recording
3 participants