Long loading time with large shapes #119

martinboue · 2023-05-05T18:59:11Z

Context:
Version: 3bc77ac
Godot version: 4.0.2

Issue:

I encounter long loading times, from 5 to 15 seconds depending on the computer, when loading a scene with large shapes (mainly SS2D_Shape_Closed). This happens when opening the scene in the editor or when loading the scene in game.

I have about ten shapes with a total of 1024 points spread over an area of 32,000x32,000px. Most of them have curved edges. All shapes have a shape material and most of them have multiple textures (edge or fill).

If I remove the shape material from all these shapes, I no longer have a performance issue.

Here is an example of one of the ten shapes:

I can upload more configuration screenshots if needed.

How to reproduce:

I have made a branch on my project and clean up the code for a minimal reproduction:
https://github.com/martinboue/ludum-dare-53/tree/smart-shape-performance
You can start the project and click "Load level". The problematic scene can be found at "res://src/level/level.tscn".

Suggestions

Is there anything I can do differently on my side to improve performance?

Smart Shape seems to do some calculations when loading, could these calculations be saved/cached?

Thanks!

EDIT: I'm ready to work on a PR if needed, but I would need some help to go in the right direction (I don't know how the addon works internally)

The text was updated successfully, but these errors were encountered:

martinboue · 2023-05-06T07:40:27Z

Here is the profiler graph when loading the scene, we can see the Process time at 4424ms which cause the freeze when loading the scene.

Script functions do not take much time:

Going step by step, I see that it is the _build_edge_with_material method of shape_base.gd that is taking a long time. It is called by shape_base.gd via _process > _on_dirty_update > cache_edges (shape_closed.gd) > _build_edges > _build_edge_with_material_thread_wrapper > _build_edge_with_material :

SmartShape2D/addons/rmsmartshape/shapes/shape_base.gd

Lines 1440 to 1632 in 3bc77ac

    
           func _build_edge_with_material( 
        
           	index_map: SS2D_IndexMap,  c_offset: float, default_quad_width: float 
        
           ) -> SS2D_Edge: 
        
           	var verts_t: PackedVector2Array = get_tessellated_points() 
        
           	var verts: PackedVector2Array = get_vertices() 
        
           	var edge := SS2D_Edge.new() 
        
           	var is_edge_contiguous: bool = _is_edge_contiguous(index_map, verts) 
        
           	edge.wrap_around = is_edge_contiguous 
        
           	if not index_map.is_valid(): 
        
           		return edge 
        
           	var c_scale := 1.0 
        
           	var c_extends := 0.0 
        
           	var edge_material_meta: SS2D_Material_Edge_Metadata = null 
        
           	var edge_material: SS2D_Material_Edge = null 
        
           	if index_map.object != null: 
        
           		edge_material_meta = index_map.object 
        
           		if edge_material_meta == null: 
        
           			return edge 
        
           		if not edge_material_meta.render: 
        
           			return edge 
        
           		edge_material = edge_material_meta.edge_material 
        
           		if edge_material == null: 
        
           			return edge 
        
           		c_offset += edge_material_meta.offset 
        
           		edge.z_index = edge_material_meta.z_index 
        
           		edge.z_as_relative = edge_material_meta.z_as_relative 
        
           		edge.material = edge_material_meta.edge_material.material 
        
           	var first_idx: int = index_map.indicies[0] 
        
           	var last_idx: int = index_map.indicies[-1] 
        
           	var first_idx_t: int = get_tessellated_idx_from_point(verts, verts_t, first_idx) 
        
           	var last_idx_t: int = get_tessellated_idx_from_point(verts, verts_t, last_idx) 
        
           	edge.first_point_key = _points.get_point_key_at_index(first_idx) 
        
           	edge.last_point_key = _points.get_point_key_at_index(last_idx) 
        
           	# How many tessellated points are contained within this index map? 
        
           	var tess_point_count: int = _edge_data_get_tess_point_count(index_map) 
        
           	var i := 0 
        
           	while i < tess_point_count: 
        
           		var tess_idx: int = (first_idx_t + i) % verts_t.size() 
        
           		var tess_idx_next: int = _get_next_unique_point_idx(tess_idx, verts_t, true) 
        
           		var tess_idx_prev: int = _get_previous_unique_point_idx(tess_idx, verts_t, true) 
        
           		# set next_point_delta 
        
           		# next_point_delta is the number of tess_pts from 
        
           		# the current tess_pt to the next unique tess_pt 
        
           		# unique meaning it has a different position from the current tess_pt 
        
           		var next_point_delta := 0 
        
           		for j in range(verts_t.size()): 
        
           			if ((tess_idx + j) % verts_t.size()) == tess_idx_next: 
        
           				next_point_delta = j 
        
           				break 
        
           		var vert_idx: int = get_vertex_idx_from_tessellated_point( 
        
           				verts, verts_t, tess_idx) 
        
           		var vert_key: int = get_point_key_at_index(vert_idx) 
        
           		var pt: Vector2 = verts_t[tess_idx] 
        
           		var pt_next: Vector2 = verts_t[tess_idx_next] 
        
           		var pt_prev: Vector2 = verts_t[tess_idx_prev] 
        
           		var texture_idx := 0 
        
           		var flip_x: bool = get_point_texture_flip(vert_key) 
        
           		var width_scale: float = _get_width_for_tessellated_point(verts, verts_t, tess_idx) 
        
           		var is_first_point: bool = (vert_idx == first_idx) and not is_edge_contiguous 
        
           		var is_last_point: bool = (vert_idx == last_idx - 1) and not is_edge_contiguous 
        
           		var is_first_tess_point: bool = (tess_idx == first_idx_t) and not is_edge_contiguous 
        
           		var is_last_tess_point: bool = (tess_idx == last_idx_t - 1) and not is_edge_contiguous 
        
           		var tex: Texture2D = null 
        
           		var tex_size := Vector2(default_quad_width, default_quad_width) 
        
           		var fitmode := SS2D_Material_Edge.FITMODE.SQUISH_AND_STRETCH 
        
           		if edge_material != null: 
        
           			if edge_material.randomize_texture: 
        
           				texture_idx = randi() % edge_material.textures.size() 
        
           			else : 
        
           				texture_idx = get_point_texture_index(vert_key) 
        
           			tex = edge_material.get_texture(texture_idx) 
        
           			tex_size = tex.get_size() 
        
           			fitmode = edge_material.fit_mode 
        
           			# Exit if we have an edge material defined but no texture to render 
        
           			if tex == null: 
        
           				i += next_point_delta 
        
           				continue 
        
           		var new_quad: SS2D_Quad = build_quad_from_two_points( 
        
           			pt, 
        
           			pt_next, 
        
           			tex, 
        
           			width_scale * c_scale * tex_size.y, 
        
           			flip_x, 
        
           			should_flip_edges(), 
        
           			is_first_point, 
        
           			is_last_point, 
        
           			c_offset, 
        
           			c_extends, 
        
           			fitmode 
        
           		) 
        
           		var new_quads: Array[SS2D_Quad] = [] 
        
           		new_quads.push_back(new_quad) 
        
           		# Corner Quad 
        
           		if edge_material != null and edge_material.use_corner_texture: 
        
           			if tess_idx != first_idx_t or is_edge_contiguous: 
        
           				var prev_width: float = _get_width_for_tessellated_point(verts, verts_t, tess_idx_prev) 
        
           				var q: SS2D_Quad = _edge_generate_corner( 
        
           					pt_prev, 
        
           					pt, 
        
           					pt_next, 
        
           					prev_width, 
        
           					width_scale, 
        
           					tex_size.y, 
        
           					edge_material, 
        
           					texture_idx, 
        
           					c_scale, 
        
           					c_offset 
        
           				) 
        
           				if q != null: 
        
           					new_quads.push_front(q) 
        
           		# Taper Quad 
        
           		# Bear in mind, a point can be both first AND last 
        
           		# Consider an edge that consists of two points (one edge) 
        
           		# This first point is used to generate the quad; it is both first and last 
        
           		if is_first_tess_point and edge_material != null and edge_material.use_taper_texture: 
        
           			var taper_texture: Texture2D = edge_material.get_texture_taper_left(texture_idx) 
        
           			if taper_texture != null: 
        
           				var taper_size: Vector2 = taper_texture.get_size() 
        
           				var fit: bool = absf(taper_size.x) <= new_quad.get_length_average() 
        
           				if fit: 
        
           					var taper_quad := new_quad.duplicate() 
        
           					taper_quad.corner = 0 
        
           					taper_quad.texture = taper_texture 
        
           					var delta_normal: Vector2 = (taper_quad.pt_d - taper_quad.pt_a).normalized() 
        
           					var offset: Vector2 = delta_normal * taper_size 
        
           					taper_quad.pt_d = taper_quad.pt_a + offset 
        
           					taper_quad.pt_c = taper_quad.pt_b + offset 
        
           					new_quad.pt_a = taper_quad.pt_d 
        
           					new_quad.pt_b = taper_quad.pt_c 
        
           					new_quads.push_front(taper_quad) 
        
           				# If a new taper quad doesn't fit, re-texture the new_quad 
        
           				else: 
        
           					new_quad.texture = taper_texture 
        
           		if is_last_tess_point and edge_material != null and edge_material.use_taper_texture: 
        
           			var taper_texture: Texture2D = edge_material.get_texture_taper_right(texture_idx) 
        
           			if taper_texture != null: 
        
           				var taper_size: Vector2 = taper_texture.get_size() 
        
           				var fit: bool = absf(taper_size.x) <= new_quad.get_length_average() 
        
           				if fit: 
        
           					var taper_quad := new_quad.duplicate() 
        
           					taper_quad.corner = 0 
        
           					taper_quad.texture = taper_texture 
        
           					var delta_normal: Vector2 = (taper_quad.pt_d - taper_quad.pt_a).normalized() 
        
           					var offset: Vector2 = delta_normal * taper_size 
        
           					taper_quad.pt_a = taper_quad.pt_d - offset 
        
           					taper_quad.pt_b = taper_quad.pt_c - offset 
        
           					new_quad.pt_d = taper_quad.pt_a 
        
           					new_quad.pt_c = taper_quad.pt_b 
        
           					new_quads.push_back(taper_quad) 
        
           				# If a new taper quad doesn't fit, re-texture the new_quad 
        
           				else: 
        
           					new_quad.texture = taper_texture 
        
           		# Final point for closed shapes fix 
        
           		# Corner quads aren't always correctly when the corner is between final and first pt 
        
           		if is_last_point and is_edge_contiguous: 
        
           			var idx_mid: int = verts_t.size() - 1 
        
           			var idx_next: int = _get_next_unique_point_idx(idx_mid, verts_t, true) 
        
           			var idx_prev: int = _get_previous_unique_point_idx(idx_mid, verts_t, true) 
        
           			var p_p: Vector2 = verts_t[idx_prev] 
        
           			var p_m: Vector2 = verts_t[idx_mid] 
        
           			var p_n: Vector2 = verts_t[idx_next] 
        
           			var w_p: float = _get_width_for_tessellated_point(verts, verts_t, idx_prev) 
        
           			var w_m: float = _get_width_for_tessellated_point(verts, verts_t, idx_mid) 
        
           			var q: SS2D_Quad = _edge_generate_corner( 
        
           				p_p, p_m, p_n, w_p, w_m, tex_size.y, edge_material, texture_idx, c_scale, c_offset 
        
           			) 
        
           			if q != null: 
        
           				new_quads.push_back(q) 
        
           		# Add new quads to edge 
        
           		for q in new_quads: 
        
           			edge.quads.push_back(q) 
        
           		i += next_point_delta 
        
           	if edge_material_meta != null: 
        
           		if edge_material_meta.weld: 
        
           			_weld_quad_array(edge.quads, edge.wrap_around) 
        
           	return edge

mphe · 2023-05-06T11:39:26Z

IIRC this topic came up on discord a while a go but I think nobody further investigated the problem.

My guess is that it might be related to threading in _build_edges() in shape_base.gd.

SmartShape2D/addons/rmsmartshape/shapes/shape_base.gd

Lines 1054 to 1063 in 3bc77ac

    
           var threads: Array[Thread] = [] 
        
           for index_map in index_maps: 
        
           	var thread := Thread.new() 
        
           	var args := [index_map, s_mat.render_offset, 0.0] 
        
           	var priority := 2 
        
           	thread.start(self._build_edge_with_material_thread_wrapper.bind(args), priority) 
        
           	threads.push_back(thread) 
        
           for thread in threads: 
        
           	var new_edge: SS2D_Edge = thread.wait_to_finish() 
        
           	edges.push_back(new_edge)

Can you try to add a print(index_maps.size()) at line 1054 and report how many index maps it processes?
It could be possible that it spins up an unhealthy amount of threads. I'm not sure how Godot 4 handles synchronization, but it could also be that threads block each other, due to accessing shared resources.

You can also try to comment out the threading related lines (1056 - 1060) and add edges.push_back(_build_edge_with_material(index_map, s_mat.render_offset, 0.0)) instead, so that it computes the shape on the main thread instead of spawning multiple worker threads.

martinboue · 2023-05-06T14:42:04Z

Thanks for your response, I'll test this out asap (probably tomorrow) and get back to you with the result.

If this solves the problem, should I submit a PR that removes the threading? Or should I redo the threads management? I'm rather new contributing on GitHub but I would be glad to help ☺️

mphe · 2023-05-06T15:41:31Z

If it actually solves the problem, we can discuss with the other contributors, whether to remove threaded computation or refactor it. This in turn requires further testing where the culprit lies exactly.
So, one step after another :)

martinboue · 2023-05-07T13:02:23Z

I was able to test what you suggested @mphe :

By adding the print(index_maps.size()) it tells me that there are 3 maps for the shape I shared above. Each time the number of maps corresponded to the numbers of SS2D_Material_Edge_Metadata in the shape, so it seems coherent for me.

Also, replacing the threads by a direct call like below, the loading time is worse, I go from 5 seconds to 10:

for index_map in index_maps:
	edges.push_back(_build_edge_with_material(index_map, s_mat.render_offset, 0.0))

mphe · 2023-05-07T17:52:42Z

Ok, good to know that threading is not a problem.

Could you please provide an actual minimal reproduction project and not a a whole game project?

martinboue · 2023-05-07T19:22:14Z

I cleaned up the next branch to leave only the necessary (it was not the case at the beginning):
https://github.com/martinboue/ludum-dare-53/tree/smart-shape-performance

it is no longer a complete game, only two scenes are left: a starting scene with just a button to load the level scene. In these two, I left the bare minimum, only the smart shapes remains in the level.

Will it works for you?

mphe · 2023-05-07T19:37:45Z

Ah, I'm sorry, I cloned the repo but forgot to switch the branch. Yes, it's fine.

I'll try to look into it in the next days.

limbonaut · 2023-05-07T20:21:59Z

Around 3 seconds here (main branch).

Instancing level and drawing first frame took 2854 ms

Here's the benchmark code I used:

# SceneTransition
extends CanvasLayer

@onready var anim_player = $AnimationPlayer


func change_scene(scene_path: String) -> void:
#	anim_player.play("fade")
#	await anim_player.animation_finished
	var start_time = Time.get_ticks_msec()
	get_tree().change_scene_to_file(scene_path)
#	anim_player.play_backwards("fade")
#	await anim_player.animation_finished
	await RenderingServer.frame_post_draw
	var end_time := Time.get_ticks_msec()
	print("Instancing level and drawing first frame took %s ms" % [end_time - start_time])

mphe · 2023-05-12T12:40:49Z

I looked into it and timed various parts of the process and the case is clear. The edge generation code is simply an utter mess.
Lots of data is regenerated over and over again instead of caching or reusing the results, e.g. get_vertices(), get_all_point_keys(), get_tesselated_points(), and much more.
Some code could be written in O(n) but is implemented to run in O(n²). For example, get_vertex_idx_from_tessellated_point(), which tries to map a tesselated point to the corresponding, non-tesselated point. This function could be implemented to compute all matches upfront in O(n). Instead, it is called for every tesselated point, increasing the runtime cost to O(n²).
Then I found should_flip_edges(), which alone takes up to 1ms in this test case! This function is also called for every tesseleated point even though the result is constant during edge generation. Now imagine a shape with 1000 tesselated points. You will get 1s of loading time just because of this.
After pulling that function call out of the loop, it reduced the overall time consumption by ~30-40%.
And these are just some examples.

Unfortunately, fixing the performance would require carefully refactoring and optimizing large parts of the generation process, which is not trivial and requires quite a bit of time and effort.
So, it will be more of a longer-term goal to fix the performance.
However, you can get 30-40% loading time reduction for free due to the should_flip_edges() workaround.
I opened a PR here: #120.

martinboue · 2023-05-12T15:37:14Z

Thank you for your time and deep analysis @mphe.

30-40% is a good first step to improve the overall performances. It might be interesting to merge it and then iterate to improve step by step.

What can I do to help? I don't know how the edge generation code works but I'm ready to learn!

mphe · 2023-05-12T17:54:48Z

What can I do to help? I don't know how the edge generation code works but I'm ready to learn!

Read the source and try to roughly understand how the plugin works. plugin.gd contains most of the UI functionality, shape_base contains most of the shape related functionality. _on_dirty_update() regenerates the shape.
Follow every function call and you will eventually see how some stuff is computed over and over again.
Also try to get a grasp of what is actually going on and check if the implementation could be improved.
That's basically it.
If you find something that can be optimized, try to find a way to do so. Usually this is not that simple and might require some refactoring.
Add timers to identify the parts that consume the most time and start there. I have a simple helper class for time measurements here. Feel free to use it or bring your own.
If you make changes, test them and ensure the unit and integration tests pass.
Good luck and have fun!

limbonaut · 2023-05-12T18:40:01Z

Nice! Should also improve editing performance which hasn't been stellar so far. Also wanted to mention that the mesh could be cached and saved in scene file during generation and simply rendered at runtime. Are there good reasons not to do it?

mphe · 2023-05-12T23:50:10Z

Isn't that what @remorse107 has been working on?

limbonaut · 2023-05-13T02:18:03Z

I believe so. There were some issues with this approach, but if it proves to be a feasible option, it might help address the loading time problem (although it won't resolve the performance issues with the editor).

martinboue · 2023-05-21T13:14:29Z

I initiated another pull request with some other performance improvements: #122

During my tests, I noticed an additional improvement of ~25%.

Note: this is a draft PR, I don't have much time to work on it at the moment, I won't be able to complete this performance refactoring for now.

limbonaut · 2024-01-17T18:10:01Z

Is this issue still relevant? Could somebody who had issues with long loading times test the latest version from GitHub?

mphe · 2024-01-18T18:09:45Z

I've just tested the test project from top post with master and master + #122.
Loading times became better, but are still 2-4 seconds (on my machine), for a rather simple scene.
#122 adds a noticable speed boost, but @martinboue mentioned there is a failing unit test, so it can't be merged, yet.
Editing performance is still abysmal.
So, the issue still persists I'd say.

martinboue · 2024-01-18T22:09:20Z

Hi, thanks for bringing attention to this issue.

I'm no longer working with SmartShape2D or Godot. The PR is a draft, I think it needs little adjustments but serious tests/checks from an expert. I lack expertise and time to do that.

From what I remember, the PR is "almost stable", all tests pass. The mentioned failing tests are related to a change suggestion that is not included in the PR changes, just a comment suggestion if someone wants to explore the improvement.

mphe · 2024-01-19T19:21:15Z

Ah alright, thank you for the work you put into this!
Guess we can salvage your PR and investigate your ideas further.

mphe mentioned this issue May 12, 2023

Performance improvements and refactoring #120

Merged

martinboue mentioned this issue May 21, 2023

Refacto vertex index computing for performances #122

Closed

mphe mentioned this issue Jun 16, 2024

Optimize collision generation performance #159

Merged

limbonaut closed this as completed in #159 Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long loading time with large shapes #119

Long loading time with large shapes #119

martinboue commented May 5, 2023 •

edited

Loading

martinboue commented May 6, 2023 •

edited

Loading

mphe commented May 6, 2023

martinboue commented May 6, 2023

mphe commented May 6, 2023 •

edited

Loading

martinboue commented May 7, 2023

mphe commented May 7, 2023

martinboue commented May 7, 2023

mphe commented May 7, 2023

limbonaut commented May 7, 2023

mphe commented May 12, 2023

martinboue commented May 12, 2023

mphe commented May 12, 2023

limbonaut commented May 12, 2023

mphe commented May 12, 2023

limbonaut commented May 13, 2023

martinboue commented May 21, 2023

limbonaut commented Jan 17, 2024

mphe commented Jan 18, 2024

martinboue commented Jan 18, 2024

mphe commented Jan 19, 2024

Long loading time with large shapes #119

Long loading time with large shapes #119

Comments

martinboue commented May 5, 2023 • edited Loading

Issue:

How to reproduce:

Suggestions

martinboue commented May 6, 2023 • edited Loading

mphe commented May 6, 2023

martinboue commented May 6, 2023

mphe commented May 6, 2023 • edited Loading

martinboue commented May 7, 2023

mphe commented May 7, 2023

martinboue commented May 7, 2023

mphe commented May 7, 2023

limbonaut commented May 7, 2023

mphe commented May 12, 2023

martinboue commented May 12, 2023

mphe commented May 12, 2023

limbonaut commented May 12, 2023

mphe commented May 12, 2023

limbonaut commented May 13, 2023

martinboue commented May 21, 2023

limbonaut commented Jan 17, 2024

mphe commented Jan 18, 2024

martinboue commented Jan 18, 2024

mphe commented Jan 19, 2024

martinboue commented May 5, 2023 •

edited

Loading

martinboue commented May 6, 2023 •

edited

Loading

mphe commented May 6, 2023 •

edited

Loading