-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan: Automatically merge render passes to the same target when possible #12242
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering this example:
- Render to framebuf 0, load clear/clear/clear
- Render to tempbuf 1, load dontcare/dontcare/dontcare
- Render to framebuf 0 (from tempbuf 1)
- Render to tempbuf 1, load dontcare/dontcare/dontcare
- Render to framebuf 0 (from tempbuf 1)
What stops us from (incorrectly) changing this to:
- Render to framebuf 0, load clear/clear/clear + from tempbuf 1 + from tempbuf 1
- Render to tempbuf 1, load dontcare/dontcare/dontcare
- Render to tempbuf 1, load dontcare/dontcare/dontcare
It almost seems like it might be easier to move them forward, and possibly split dependencies into new framebufs where appropriate (kinda like register renaming):
- Render to tempbuf 1a, load dontcare/dontcare/dontcare
- Render to tempbuf 1b, load dontcare/dontcare/dontcare
- Render to framebuf 0, load clear/clear/clear + from tempbuf 1a + from tempbuf 1b
For example, I seem to recall Wipeout billboards are drawn similar to this (i.e. throughout scene)?
-[Unknown]
Yeah, many things are missing, hence WIP. Before I do any more work on this, I'm gonna build a little system that uses vkCmdWriteTimestamp to get some accurate timing of the GPU load, so we can see if this early proof of concept is worth it. But I do believe it will be. Regarding pushing forward or back, I just wanted to prove the concept so I did the easiest thing I could get working fast. Splitting into new framebufs and moving forward in that case indeed would be a really good thing. |
…sible. Should speed things up a bit on mobile in some games that do stupid things like GoW. Currently only enabled in GoW, but plan to enable this globally as it should be quite cheap when nothing is detected.
ab0cb85
to
773cb5f
Compare
@unknownbrackets , I added a new check that I believe will solve the example you stated. I tried conceptualizing the better backwards variant (register renaming is a good analogy, if we can do it we should), but I failed to write it down - it's a bit late in the evening though. Wipeout at the start of the first track is the most egregrious example of this that I've seen, it's not that common a pattern though. BTW, one big additional annoyance with the GoW games is how they insist on clearing the screen with very large and differently tesselated meshes instead of some easily detected vertical stripes. We end up with LOAD on almost all render passes, even where CLEAR would be totally possible. Could do ugly hacks but they'd be really, really ugly... |
This is a massive boost in these games on Adreno as well, at least when you push the resolution a little (was testing on a very fast Adreno phone, it took 3x res for this to make a major difference, but yeah, 30% boost easily once it was bandwidth-bottlenecked). |
// keep going. | ||
break; | ||
case VKRStepType::COPY: | ||
if (steps[j]->copy.src == fb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, sorry, one more - what if this copy or blit writes TO the fb? We don't want those out of order either, right?
-[Unknown]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we definitely don't, and can't combine that (well, possibly such copies might be better replaced with a quad draw in a few cases, but I think the opportunities for that will be rare)
Sounds very cool. Makes sense that this would benefit - PSP games were often trying to manage the tiny constrained 2MB of VRAM, so shuffling things around and drawing only what's necessary. Modern devices don't have the same problems, so it's very cool we can combine them. I do think there could be a lot of benefit if we could track (in some cases) what areas are dirty from draws, at least when they are rectangular (not sure of the GoW clears are still rectangular?) For example, it might help potentially with splitting buffers that are being depal'd or where things are being drawn in the margin. Not sure where the tradeoff is, though. -[Unknown] |
Yeah, that kind of thing would help with Ridge Racer and many others too... Sometimes it's really hard though to know the drawn region. The GoW clears are rectangular but are not even drawn with through-mode IIRC. We'd have to software transform the positions and do some min/max to find the bounding box, then in the case of clears even more magic to compute the coverage. And that assumes that we guessed the dimensions of the render target correct from scissor/viewport.. although if we did this for everything, our guesses could be a lot better. |
Anyway, I'm merging this when CI passes, enabled only for GoW games for now, and for all games after 1.9.0 is stable. Maybe will look into also doing this with OpenGL someday. |
Should speed things up a bit on mobile in some games that do stupid things like GoW. Currently only enabled in GoW, but plan to enable this globally as it should be quite cheap when nothing is detected.
Enabling it globally will happen after 1.9.0 (due to the risk, there are probably some pass dependencies it doesn't yet detect) but might add some more games first.
Hopefully this will be enough to bring Vulkan up to GL speed on some Adrenos (see #12228).