Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown with framebuffer emulation enable (GLES) #592

Closed
Gillou68310 opened this issue Jun 25, 2015 · 23 comments
Closed

Slowdown with framebuffer emulation enable (GLES) #592

Gillou68310 opened this issue Jun 25, 2015 · 23 comments

Comments

@Gillou68310
Copy link
Contributor

Here's a video of the slowdown, it happens while the birds start flying out of the tree (game speed is set at 250%)

https://dl.dropboxusercontent.com/u/27654797/mupen64plus-ui-console%202015-06-25%2018-21-00-29.avi

This seems to be caused by a dummy swapbuffer, here's a statistics graph showing draw calls (the slowdown starts after the red line ;-))

sans titre

And here's a list of GL calls during the dummy swapbuffer:

sans titre2

Sounds like a useless call to ``FrameBufferList::renderBuffer(u32 _address)` ;-)

@Gillou68310
Copy link
Contributor Author

const bool bNeedUpdate = gDP.colorImage.changed != 0 || (bCFB ? true : (*REG.VI_ORIGIN != VI.lastOrigin));

@gonetz Is it really necessary to update on (*REG.VI_ORIGIN != VI.lastOrigin) (noob question here)?

The dummy swapbuffer happens when gDP.colorImage.changed == 0 && (*REG.VI_ORIGIN != VI.lastOrigin)

@gonetz
Copy link
Owner

gonetz commented Jun 27, 2015

Is it really necessary to update on (*REG.VI_ORIGIN != VI.lastOrigin)

Yes. Look:

  • *REG.VI_ORIGIN != VI.lastOrigin means that game switches to another frame buffer. The case for double/triple buffering. We definitely must switch buffers too.
  • bCFB means that frame buffer is filled by CPU. Double buffering may not be used in that case, that is CPU may update the same buffer. Strange, but true. Thus, I can't rely on *REG.VI_ORIGIN != VI.lastOrigin condition in that case. Dummy swapbuffer for CFB is a normal.
  • gDP.colorImage.changed is less obvious condition. I thought that it is useless, because all games use at least double buffering when not in CFB mode. I had to add it for Quake II - it switches to single buffer mode when underwater. That is *REG.VI_ORIGIN != VI.lastOrigin does not work again and screen does not updated.

I set conditional breakpoint: gDP.colorImage.changed == 0 && (*REG.VI_ORIGIN != VI.lastOrigin)
It triggers constantly in SM64 while Mario head is on screen. The game swaps buffers not right after the buffer is drawn. That is first VI_UpdateScreen() after RSP_ProcessDList() uses old buffer. It would not cause buffers swap without gDP.colorImage.changed != 0. On next VI_UpdateScreen() gDP.colorImage.changed == 0 (new display list is not ready yet), but *REG.VI_ORIGIN != VI.lastOrigin and buffers swap again. That does not cause any slowdown.

The situation is different in game. The breakpoint triggered the first time only when I closed to the bridge. The question is why dummy swapbuffer may cause slowdown? It is just render one screen-size texture rectangle. Your draw calls graph looks strange after the red line. Before the line number of calls is almost constant. After the line we see cyclic drops to zero followed by peaks, where number of calls is almost doubled. What causes these drops and peaks? I suspect that drops correspond to dummy swapbuffers which take only few commands. But what causes the peaks? Probably number of birds, which are polygonal, not just a sprite.

Please remove gDP.colorImage.changed != 0 condition, which is useless for this game and check performance again.

@Gillou68310
Copy link
Contributor Author

Thanks for the detailed explanations, very instructive!!!

I suspect that drops correspond to dummy swapbuffers which take only few commands. But what causes the peaks? Probably number of birds, which are polygonal, not just a sprite.

That's correct! Here's a new graph without gDP.colorImage.changed != 0 condition:

sans titre

Birds appear after the red line and the number of draw calls jump from ~110 to ~190

Also no more slowdown with removed condition ;-)

@gonetz
Copy link
Owner

gonetz commented Jun 28, 2015

Also no more slowdown with removed condition ;-)

Interesting, why no more slowdown? Is it so expensive to draw one additional screen-size texture rectangle and swap buffers? Dummy swapbuffer does not cause slowdown with Mario head. Or does it? Could you test performance of Mario head scene with and without the "gDP.colorImage.changed != 0" condition?

@Gillou68310
Copy link
Contributor Author

Dummy swapbuffer does not cause slowdown with Mario head. Or does it? Could you test performance of Mario head scene with and without the "gDP.colorImage.changed != 0" condition?

Just checked and yes it's also slower in Mario's head scene!

Is it so expensive to draw one additional screen-size texture rectangle and swap buffers?

Sounds weird indead! But I think swapbuffer is the cause here. If I switch off framebuffer emulation and force swapping buffers 2 times per frame I get similar slowdowns. It's worth mentioning that it's not naticeable at 100% speed but only if you try to speed up the emu (it just won't speed up).

@gonetz
Copy link
Owner

gonetz commented Jun 29, 2015

Thanks for testing! The results are informative.
I'd make "gDP.colorImage.changed != 0" check optional if I knew a game where it cause slowdown in normal conditions, that is with frame limits on. Anyway it's useful to know that I have potential performance eater here.

@Gillou68310
Copy link
Contributor Author

I'd make "gDP.colorImage.changed != 0" check optional if I knew a game where it cause slowdown in normal conditions, that is with frame limits on

That might probably be the case on low end android device that's why I track it down in the first place ;-)

@Gillou68310
Copy link
Contributor Author

Is quake2 the only game which requires the condition?

@gonetz
Copy link
Owner

gonetz commented Jun 29, 2015

Is quake2 the only game which requires the condition?

I don't know. Quake II is the game, for which I added that condition. Probably it is the only one such game. I don't have time to test all games.

@Gillou68310
Copy link
Contributor Author

Hum the eglswapbuffer beeing slow seems to be a common issue on android damn!

@Gillou68310
Copy link
Contributor Author

Just thinking out loud but what would happen if we force to swap buffers only on last vi update, so no matter if a game is using double or even tripple buffering, swapbuffer will be called only once per frame?

@gonetz
Copy link
Owner

gonetz commented Jun 30, 2015

"last vi update" - what is it?
we must do FrameBufferList::renderBuffer and swap buffers when *REG.VI_ORIGIN changed.

Regarding buffer swap. Actually, it is possible to not use buffer swap at all, and render FBO right into front buffer. Original glN64 works exactly like that. Render to front buffer without buffers swap does not work on Linux for some reason though, so I had to use swapbuffers there. Since it works on other platforms as well, I decided to make it common. Later I found that GLES2 does not support glDrawBuffer command, so swap buffer was the only option.

I do't think that eglswapbuffer is a bottleneck, because it does not cause performance problems for other video plugins. mupen64plus-video-gliden64 is close to mupen64plus-video-gln64 in terms of OpenGL usage. It should have similar performance. The main difference is in shaders. GLideN64 fragment shaders are quite large, complex and have code, which is redundant for most games. Shaders simplification probably can boost performance on GLES2 devices.

@Gillou68310
Copy link
Contributor Author

Maybe another driver issue! I will make some measurements just to be sure but it seems like the drivers is waiting for completion of eglswapbuffer when it should be mostly asynchronous.

@Gillou68310
Copy link
Contributor Author

Ok here's my results:
capture
capture2

Blue line: Number of draw calls
Red line: Time spend in eglSwapBuffer function in milliseconds

You can see that the first five dummy swapbuffers are ok but suddenly it takes almost ~9ms to process the function.
This has been tested on both real GLES hardware and Power SDK emulator.

@Gillou68310
Copy link
Contributor Author

Same graphs at 100% speed:
capture
capture2

@Gillou68310 Gillou68310 changed the title Mario64 slowdown with framebuffer emulation enable Slowdown with framebuffer emulation enable (GLES) Jul 2, 2015
@Gillou68310
Copy link
Contributor Author

The same problem is present in zelda oot.

I'd make "gDP.colorImage.changed != 0" check optional if I knew a game where it cause slowdown in normal conditions, that is with frame limits on

Maybe we can make it optional for GLES only? What do you think?

@gonetz
Copy link
Owner

gonetz commented Jul 2, 2015

Does it make visible performance gain on real device?

@Gillou68310
Copy link
Contributor Author

Definitely! But don't hesitate to try on your Note3 if you want to see it in action (and have some time of course).

@Gillou68310
Copy link
Contributor Author

I think the worse candidate is goldeneye because the problem is noticeable at 100% speed (actually the game is already slow but the dummy swapbuffer make it even worse)

Here's a savestate if you wanna try:
https://dl.dropboxusercontent.com/u/27654797/GoldenEye%20007%20(U)%20%5B!%5D.st0

@gonetz
Copy link
Owner

gonetz commented Jul 2, 2015

I believe you. Ok, I'll make "gDP.colorImage.changed != 0" check Quake 2 specific.

@Gillou68310
Copy link
Contributor Author

Thanks! I hope it won't break anything else ;-)

@fzurita
Copy link
Contributor

fzurita commented Jul 3, 2015

Instead of reducing accuracy for speed, maybe the right answer would be to have a config parameter to enable/disable this check.

This check could be disabled by default and a warning could be provided to a user that enables it saying that it "will improve accuracy but reduce performance". I believe this is the approach that Dolphin uses.

@gonetz
Copy link
Owner

gonetz commented Jul 3, 2015

It's not exactly accuracy VS speed problem. This check is a workaround for the problem, which probably exists for only one game. If the problem is more common, solution can include new config parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants