Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime subdivision performance is slow #20

Open
fire opened this issue Oct 15, 2022 · 5 comments
Open

Runtime subdivision performance is slow #20

fire opened this issue Oct 15, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@fire
Copy link
Contributor

fire commented Oct 15, 2022

Port metal shaders from opensubdiv? Unknown amount of work

@tefusion
Copy link
Owner

Even if it's not that much work I don't think this is worth it. There's a bunch of other handling within opensubdiv which makes it possible to enable adaptive subdivision (not implemented yet, probably gonna add it as an editor setting) and also made it easy to interpolate skinning and UV's. Porting all of that seems to be too much work overall for the benefit of it imho.

What I had a short look at instead was BFR. It was newly released with opensubdiv 3.5 a month ago. I don't fully understand what the downside of it is ("repeated evaluation of a fixed set of points" is still better done with far, which would be the current baking implementation). Upsides I saw are you can set it to triangulate quads at the end and it's apparently faster than Far based table solution, which we don't even currently use, so performance should see a significant increase for runtime subdivision (SubdivMeshInstance3D with skinning is very laggy right now on high levels.)

It also didn't look too complicated to implement from the tutorials I tried out and might be worth trying out instead. All of this is future stuff though, I don't think I'll work that much on this project myself in the near future except bugfixes/stabilization. I just needed semi quick subdivision for a project I'll continue working on now.

@tefusion tefusion added the enhancement New feature or request label Oct 16, 2022
@fire
Copy link
Contributor Author

fire commented Oct 16, 2022

I'm perfectly happy with what godot_opensubdiv has too. We're using float=64, so I have to debug that crash. The msvc compile time bug isn't critical.

@fire fire changed the title Compute shaders for subdiv Runtime subdivision performance is slow Oct 16, 2022
@tefusion
Copy link
Owner

Hi again! I currently use the topology data stuff without subdivision for a character with lots of blendshapes and think I got the 2 major performance problems down now.

1. Triangulation Code

There is a lot of problems with this one. First of it's using SurfaceTool internally and generates tangents every single time if it has normals. That takes combined with the other triangulation code a total of 76ms for an around 10.000 vertices mesh. Remove that and we're already down to 14ms. Remove it entirely and resize lists before to not have to append you can half that again. With caching the index array and other stuff this probably can be cut down to half of that again. So I won't implement this right away, but this is something I'll definitely do when I have the time.

2. Subdivision itself

Forget my former post the new bfr is only really suitable as a replacement for adaptive subdivision which imo looks ugly on most things and only really is suitable for faraway objects (I just looked at the demo stuff, but e.g. a cube has a lots of free spots).

Instead what we should use is StencilTables to be able to actually use all the different fast subdivision options opensubdiv has. They provide a tutorial for the StencilTables so that should not be too hard, actually using it with the different GPU computation libraries might take longer though, but all official examples use them so atleast there is a lot reference.

@fire
Copy link
Contributor Author

fire commented Dec 5, 2022

I think we can do a optimized implementation of what we use SurfaceTool for.

What are stencil tables?

@tefusion
Copy link
Owner

Stencils are used to factorize the interpolation calculations that subdivision schema apply to vertices of smooth surfaces. If the topology being subdivided remains constant, factorizing the subdivision weights into stencils during a pre-compute pass yields substantial amortizations at run-time when re-posing the control cage.

Factorizing the subdivision weights also allows to express each subdivided vertex as a weighted sum of vertices from the control cage. This step effectively removes any data inter-dependency between subdivided vertices : the computations of subdivision interpolation can be applied to each vertex in parallel without any barriers or constraint. The Osd classes leverage these properties by exploiting CPU and GPU parallelism.

from https://graphics.pixar.com/opensubdiv/docs/far_overview.html

The second part is the important thing. Currently just simple subdivision algorithms are being used here with no real parallelism and stencils make it possible to use OpenCL/GLSL/... or also CPU parallelism. I still don't fully know how to implement it well so that someone could also come around and use the library they like. My current bet is to do something like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants