Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long loading time with large shapes #119

Closed
martinboue opened this issue May 5, 2023 · 20 comments · Fixed by #159
Closed

Long loading time with large shapes #119

martinboue opened this issue May 5, 2023 · 20 comments · Fixed by #159

Comments

@martinboue
Copy link
Contributor

martinboue commented May 5, 2023

Context:
Version: 3bc77ac
Godot version: 4.0.2

Issue:

I encounter long loading times, from 5 to 15 seconds depending on the computer, when loading a scene with large shapes (mainly SS2D_Shape_Closed). This happens when opening the scene in the editor or when loading the scene in game.

I have about ten shapes with a total of 1024 points spread over an area of 32,000x32,000px. Most of them have curved edges. All shapes have a shape material and most of them have multiple textures (edge or fill).

If I remove the shape material from all these shapes, I no longer have a performance issue.

Here is an example of one of the ten shapes:
image
I can upload more configuration screenshots if needed.

How to reproduce:

I have made a branch on my project and clean up the code for a minimal reproduction:
https://github.com/martinboue/ludum-dare-53/tree/smart-shape-performance
You can start the project and click "Load level". The problematic scene can be found at "res://src/level/level.tscn".

Suggestions

Is there anything I can do differently on my side to improve performance?

Smart Shape seems to do some calculations when loading, could these calculations be saved/cached?

Thanks!

EDIT: I'm ready to work on a PR if needed, but I would need some help to go in the right direction (I don't know how the addon works internally)

@martinboue
Copy link
Contributor Author

martinboue commented May 6, 2023

Here is the profiler graph when loading the scene, we can see the Process time at 4424ms which cause the freeze when loading the scene.
image
Script functions do not take much time:
image

Going step by step, I see that it is the _build_edge_with_material method of shape_base.gd that is taking a long time. It is called by shape_base.gd via _process > _on_dirty_update > cache_edges (shape_closed.gd) > _build_edges > _build_edge_with_material_thread_wrapper > _build_edge_with_material :

func _build_edge_with_material(
index_map: SS2D_IndexMap, c_offset: float, default_quad_width: float
) -> SS2D_Edge:
var verts_t: PackedVector2Array = get_tessellated_points()
var verts: PackedVector2Array = get_vertices()
var edge := SS2D_Edge.new()
var is_edge_contiguous: bool = _is_edge_contiguous(index_map, verts)
edge.wrap_around = is_edge_contiguous
if not index_map.is_valid():
return edge
var c_scale := 1.0
var c_extends := 0.0
var edge_material_meta: SS2D_Material_Edge_Metadata = null
var edge_material: SS2D_Material_Edge = null
if index_map.object != null:
edge_material_meta = index_map.object
if edge_material_meta == null:
return edge
if not edge_material_meta.render:
return edge
edge_material = edge_material_meta.edge_material
if edge_material == null:
return edge
c_offset += edge_material_meta.offset
edge.z_index = edge_material_meta.z_index
edge.z_as_relative = edge_material_meta.z_as_relative
edge.material = edge_material_meta.edge_material.material
var first_idx: int = index_map.indicies[0]
var last_idx: int = index_map.indicies[-1]
var first_idx_t: int = get_tessellated_idx_from_point(verts, verts_t, first_idx)
var last_idx_t: int = get_tessellated_idx_from_point(verts, verts_t, last_idx)
edge.first_point_key = _points.get_point_key_at_index(first_idx)
edge.last_point_key = _points.get_point_key_at_index(last_idx)
# How many tessellated points are contained within this index map?
var tess_point_count: int = _edge_data_get_tess_point_count(index_map)
var i := 0
while i < tess_point_count:
var tess_idx: int = (first_idx_t + i) % verts_t.size()
var tess_idx_next: int = _get_next_unique_point_idx(tess_idx, verts_t, true)
var tess_idx_prev: int = _get_previous_unique_point_idx(tess_idx, verts_t, true)
# set next_point_delta
# next_point_delta is the number of tess_pts from
# the current tess_pt to the next unique tess_pt
# unique meaning it has a different position from the current tess_pt
var next_point_delta := 0
for j in range(verts_t.size()):
if ((tess_idx + j) % verts_t.size()) == tess_idx_next:
next_point_delta = j
break
var vert_idx: int = get_vertex_idx_from_tessellated_point(
verts, verts_t, tess_idx)
var vert_key: int = get_point_key_at_index(vert_idx)
var pt: Vector2 = verts_t[tess_idx]
var pt_next: Vector2 = verts_t[tess_idx_next]
var pt_prev: Vector2 = verts_t[tess_idx_prev]
var texture_idx := 0
var flip_x: bool = get_point_texture_flip(vert_key)
var width_scale: float = _get_width_for_tessellated_point(verts, verts_t, tess_idx)
var is_first_point: bool = (vert_idx == first_idx) and not is_edge_contiguous
var is_last_point: bool = (vert_idx == last_idx - 1) and not is_edge_contiguous
var is_first_tess_point: bool = (tess_idx == first_idx_t) and not is_edge_contiguous
var is_last_tess_point: bool = (tess_idx == last_idx_t - 1) and not is_edge_contiguous
var tex: Texture2D = null
var tex_size := Vector2(default_quad_width, default_quad_width)
var fitmode := SS2D_Material_Edge.FITMODE.SQUISH_AND_STRETCH
if edge_material != null:
if edge_material.randomize_texture:
texture_idx = randi() % edge_material.textures.size()
else :
texture_idx = get_point_texture_index(vert_key)
tex = edge_material.get_texture(texture_idx)
tex_size = tex.get_size()
fitmode = edge_material.fit_mode
# Exit if we have an edge material defined but no texture to render
if tex == null:
i += next_point_delta
continue
var new_quad: SS2D_Quad = build_quad_from_two_points(
pt,
pt_next,
tex,
width_scale * c_scale * tex_size.y,
flip_x,
should_flip_edges(),
is_first_point,
is_last_point,
c_offset,
c_extends,
fitmode
)
var new_quads: Array[SS2D_Quad] = []
new_quads.push_back(new_quad)
# Corner Quad
if edge_material != null and edge_material.use_corner_texture:
if tess_idx != first_idx_t or is_edge_contiguous:
var prev_width: float = _get_width_for_tessellated_point(verts, verts_t, tess_idx_prev)
var q: SS2D_Quad = _edge_generate_corner(
pt_prev,
pt,
pt_next,
prev_width,
width_scale,
tex_size.y,
edge_material,
texture_idx,
c_scale,
c_offset
)
if q != null:
new_quads.push_front(q)
# Taper Quad
# Bear in mind, a point can be both first AND last
# Consider an edge that consists of two points (one edge)
# This first point is used to generate the quad; it is both first and last
if is_first_tess_point and edge_material != null and edge_material.use_taper_texture:
var taper_texture: Texture2D = edge_material.get_texture_taper_left(texture_idx)
if taper_texture != null:
var taper_size: Vector2 = taper_texture.get_size()
var fit: bool = absf(taper_size.x) <= new_quad.get_length_average()
if fit:
var taper_quad := new_quad.duplicate()
taper_quad.corner = 0
taper_quad.texture = taper_texture
var delta_normal: Vector2 = (taper_quad.pt_d - taper_quad.pt_a).normalized()
var offset: Vector2 = delta_normal * taper_size
taper_quad.pt_d = taper_quad.pt_a + offset
taper_quad.pt_c = taper_quad.pt_b + offset
new_quad.pt_a = taper_quad.pt_d
new_quad.pt_b = taper_quad.pt_c
new_quads.push_front(taper_quad)
# If a new taper quad doesn't fit, re-texture the new_quad
else:
new_quad.texture = taper_texture
if is_last_tess_point and edge_material != null and edge_material.use_taper_texture:
var taper_texture: Texture2D = edge_material.get_texture_taper_right(texture_idx)
if taper_texture != null:
var taper_size: Vector2 = taper_texture.get_size()
var fit: bool = absf(taper_size.x) <= new_quad.get_length_average()
if fit:
var taper_quad := new_quad.duplicate()
taper_quad.corner = 0
taper_quad.texture = taper_texture
var delta_normal: Vector2 = (taper_quad.pt_d - taper_quad.pt_a).normalized()
var offset: Vector2 = delta_normal * taper_size
taper_quad.pt_a = taper_quad.pt_d - offset
taper_quad.pt_b = taper_quad.pt_c - offset
new_quad.pt_d = taper_quad.pt_a
new_quad.pt_c = taper_quad.pt_b
new_quads.push_back(taper_quad)
# If a new taper quad doesn't fit, re-texture the new_quad
else:
new_quad.texture = taper_texture
# Final point for closed shapes fix
# Corner quads aren't always correctly when the corner is between final and first pt
if is_last_point and is_edge_contiguous:
var idx_mid: int = verts_t.size() - 1
var idx_next: int = _get_next_unique_point_idx(idx_mid, verts_t, true)
var idx_prev: int = _get_previous_unique_point_idx(idx_mid, verts_t, true)
var p_p: Vector2 = verts_t[idx_prev]
var p_m: Vector2 = verts_t[idx_mid]
var p_n: Vector2 = verts_t[idx_next]
var w_p: float = _get_width_for_tessellated_point(verts, verts_t, idx_prev)
var w_m: float = _get_width_for_tessellated_point(verts, verts_t, idx_mid)
var q: SS2D_Quad = _edge_generate_corner(
p_p, p_m, p_n, w_p, w_m, tex_size.y, edge_material, texture_idx, c_scale, c_offset
)
if q != null:
new_quads.push_back(q)
# Add new quads to edge
for q in new_quads:
edge.quads.push_back(q)
i += next_point_delta
if edge_material_meta != null:
if edge_material_meta.weld:
_weld_quad_array(edge.quads, edge.wrap_around)
return edge

@mphe
Copy link
Collaborator

mphe commented May 6, 2023

IIRC this topic came up on discord a while a go but I think nobody further investigated the problem.

My guess is that it might be related to threading in _build_edges() in shape_base.gd.

var threads: Array[Thread] = []
for index_map in index_maps:
var thread := Thread.new()
var args := [index_map, s_mat.render_offset, 0.0]
var priority := 2
thread.start(self._build_edge_with_material_thread_wrapper.bind(args), priority)
threads.push_back(thread)
for thread in threads:
var new_edge: SS2D_Edge = thread.wait_to_finish()
edges.push_back(new_edge)

Can you try to add a print(index_maps.size()) at line 1054 and report how many index maps it processes?
It could be possible that it spins up an unhealthy amount of threads. I'm not sure how Godot 4 handles synchronization, but it could also be that threads block each other, due to accessing shared resources.

You can also try to comment out the threading related lines (1056 - 1060) and add edges.push_back(_build_edge_with_material(index_map, s_mat.render_offset, 0.0)) instead, so that it computes the shape on the main thread instead of spawning multiple worker threads.

@martinboue
Copy link
Contributor Author

Thanks for your response, I'll test this out asap (probably tomorrow) and get back to you with the result.

If this solves the problem, should I submit a PR that removes the threading? Or should I redo the threads management? I'm rather new contributing on GitHub but I would be glad to help ☺️

@mphe
Copy link
Collaborator

mphe commented May 6, 2023

If it actually solves the problem, we can discuss with the other contributors, whether to remove threaded computation or refactor it. This in turn requires further testing where the culprit lies exactly.
So, one step after another :)

@martinboue
Copy link
Contributor Author

I was able to test what you suggested @mphe :

By adding the print(index_maps.size()) it tells me that there are 3 maps for the shape I shared above. Each time the number of maps corresponded to the numbers of SS2D_Material_Edge_Metadata in the shape, so it seems coherent for me.

Also, replacing the threads by a direct call like below, the loading time is worse, I go from 5 seconds to 10:

for index_map in index_maps:
	edges.push_back(_build_edge_with_material(index_map, s_mat.render_offset, 0.0))

@mphe
Copy link
Collaborator

mphe commented May 7, 2023

Ok, good to know that threading is not a problem.

Could you please provide an actual minimal reproduction project and not a a whole game project?

@martinboue
Copy link
Contributor Author

I cleaned up the next branch to leave only the necessary (it was not the case at the beginning):
https://github.com/martinboue/ludum-dare-53/tree/smart-shape-performance

it is no longer a complete game, only two scenes are left: a starting scene with just a button to load the level scene. In these two, I left the bare minimum, only the smart shapes remains in the level.

Will it works for you?

@mphe
Copy link
Collaborator

mphe commented May 7, 2023

Ah, I'm sorry, I cloned the repo but forgot to switch the branch. Yes, it's fine.

I'll try to look into it in the next days.

@limbonaut
Copy link
Collaborator

Around 3 seconds here (main branch).

Instancing level and drawing first frame took 2854 ms

Here's the benchmark code I used:

# SceneTransition
extends CanvasLayer

@onready var anim_player = $AnimationPlayer


func change_scene(scene_path: String) -> void:
#	anim_player.play("fade")
#	await anim_player.animation_finished
	var start_time = Time.get_ticks_msec()
	get_tree().change_scene_to_file(scene_path)
#	anim_player.play_backwards("fade")
#	await anim_player.animation_finished
	await RenderingServer.frame_post_draw
	var end_time := Time.get_ticks_msec()
	print("Instancing level and drawing first frame took %s ms" % [end_time - start_time])

@mphe
Copy link
Collaborator

mphe commented May 12, 2023

I looked into it and timed various parts of the process and the case is clear. The edge generation code is simply an utter mess.
Lots of data is regenerated over and over again instead of caching or reusing the results, e.g. get_vertices(), get_all_point_keys(), get_tesselated_points(), and much more.
Some code could be written in O(n) but is implemented to run in O(n²). For example, get_vertex_idx_from_tessellated_point(), which tries to map a tesselated point to the corresponding, non-tesselated point. This function could be implemented to compute all matches upfront in O(n). Instead, it is called for every tesselated point, increasing the runtime cost to O(n²).
Then I found should_flip_edges(), which alone takes up to 1ms in this test case! This function is also called for every tesseleated point even though the result is constant during edge generation. Now imagine a shape with 1000 tesselated points. You will get 1s of loading time just because of this.
After pulling that function call out of the loop, it reduced the overall time consumption by ~30-40%.
And these are just some examples.

Unfortunately, fixing the performance would require carefully refactoring and optimizing large parts of the generation process, which is not trivial and requires quite a bit of time and effort.
So, it will be more of a longer-term goal to fix the performance.
However, you can get 30-40% loading time reduction for free due to the should_flip_edges() workaround.
I opened a PR here: #120.

@martinboue
Copy link
Contributor Author

Thank you for your time and deep analysis @mphe.

30-40% is a good first step to improve the overall performances. It might be interesting to merge it and then iterate to improve step by step.

What can I do to help? I don't know how the edge generation code works but I'm ready to learn!

@mphe
Copy link
Collaborator

mphe commented May 12, 2023

What can I do to help? I don't know how the edge generation code works but I'm ready to learn!

Read the source and try to roughly understand how the plugin works. plugin.gd contains most of the UI functionality, shape_base contains most of the shape related functionality. _on_dirty_update() regenerates the shape.
Follow every function call and you will eventually see how some stuff is computed over and over again.
Also try to get a grasp of what is actually going on and check if the implementation could be improved.
That's basically it.
If you find something that can be optimized, try to find a way to do so. Usually this is not that simple and might require some refactoring.
Add timers to identify the parts that consume the most time and start there. I have a simple helper class for time measurements here. Feel free to use it or bring your own.
If you make changes, test them and ensure the unit and integration tests pass.
Good luck and have fun!

@limbonaut
Copy link
Collaborator

Nice! Should also improve editing performance which hasn't been stellar so far. Also wanted to mention that the mesh could be cached and saved in scene file during generation and simply rendered at runtime. Are there good reasons not to do it?

@mphe
Copy link
Collaborator

mphe commented May 12, 2023

Isn't that what @remorse107 has been working on?

@limbonaut
Copy link
Collaborator

I believe so. There were some issues with this approach, but if it proves to be a feasible option, it might help address the loading time problem (although it won't resolve the performance issues with the editor).

@martinboue
Copy link
Contributor Author

I initiated another pull request with some other performance improvements: #122

During my tests, I noticed an additional improvement of ~25%.

Note: this is a draft PR, I don't have much time to work on it at the moment, I won't be able to complete this performance refactoring for now.

@limbonaut
Copy link
Collaborator

Is this issue still relevant? Could somebody who had issues with long loading times test the latest version from GitHub?

@mphe
Copy link
Collaborator

mphe commented Jan 18, 2024

I've just tested the test project from top post with master and master + #122.
Loading times became better, but are still 2-4 seconds (on my machine), for a rather simple scene.
#122 adds a noticable speed boost, but @martinboue mentioned there is a failing unit test, so it can't be merged, yet.
Editing performance is still abysmal.
So, the issue still persists I'd say.

@martinboue
Copy link
Contributor Author

Hi, thanks for bringing attention to this issue.

I'm no longer working with SmartShape2D or Godot. The PR is a draft, I think it needs little adjustments but serious tests/checks from an expert. I lack expertise and time to do that.

From what I remember, the PR is "almost stable", all tests pass. The mentioned failing tests are related to a change suggestion that is not included in the PR changes, just a comment suggestion if someone wants to explore the improvement.

@mphe
Copy link
Collaborator

mphe commented Jan 19, 2024

Ah alright, thank you for the work you put into this!
Guess we can salvage your PR and investigate your ideas further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants