-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove temporary buffer from AddConvexPolyFilled #1811
Conversation
Hello, Thanks for submitting this. EDIT What was your intent with removing the As a crude performance test, I have bound the two variants of the code on the CTRL key (adding 200 iterations of rendering all the filled shapes: Win32 Release Optimized: Win64 Release Optimized: Win32 Debug Non-Optimized (with MSVC debug cruft) etc.
Also smaller/other things that would need be fixed later:
Thanks, |
About this:
I have added a comment about anti-aliased filling requiring a path in clockwise order. |
Removing alloca is mostly for compatibility. (it's only other use is in AddPolyline, which can be modified in the same way) Performance shouldn't have changed. Doing register-register ops ins't a win when pipelining hides the latency of reading memory. I made a benchmark of processing a rounded rectangle. (not yet pushed) And MSVC's sampling profiler shows 10% of the whole run time is spent on But i'm only measuring the pure run time of the function itself, maybe the fact that the fill triangles are interleaved with the outline degrades performance? If so, it's easy to de-interleave them, it also may be worthwhile to also use a different algorithm that doesn't produce long, thin triangles. |
Which compatibility issue do you have with
I guess it's more reasonable and representative of real use to keep them interleaved. Regardless, a simple manufactured/looping test would probably fit in the instruction cache anyway even with multiple functions running.
What's the problem with long thin triangles? Mildly unrelated, but if you fancy looking at it, the non-AA thick strokes are broken (see #288, one of the oldest PR alive, it was submitted just at the time we switched to AA primitives). |
Sorry for taking so long, i got sidetracked trying (and failing) to install vTune, to see what's really slowing it down. But reading the disassembly i think i know what's going on. In the old version the normal computing loop got unrolled to call Without doing explicit SIMD (intrinsics/ispc) i don't think i can get a (reliable) performance boost. Maybe a compromise with a small fixed size buffer will yield the best result. About the compatibility issue; MinGW has (had?) a problem with properly handling it, causing link errors. And lastly the short version why long thin traingles aren't so great: GPU's first determine which 8x8 tiles a triangle overlaps, in those tiles it applies the same process but for 2x2 groups of pixels, if the traingle is overlapping a group then the shader is executed for all the pixels in that group and pixels outside the triangle are discarded. Additinaly mobile/newer generation GPUs use tile renderers, which add another level with even larger tiles designed to fit in cache. So long thin triangles hit a lot of tiles and groups, but contribute little pixels to the output. |
Ok, i think i will leave it at this. Using the output as a buffer gets slightly worser performance when it's the only thing in cache and slightly better performace when there's cache pressure. Should i do the same for |
0c1e5bd
to
bb6a60b
Compare
While wandering into old issues I noticed you post an interesting update in Dec 2018 which was made as an edit of the original post therefore without any notification coming. We'll investigate that new idea you posted in your edit. |
I forgot to explain the different plotted quantities!
More on compatibility: using I need to merge master into my branch and re-run the benchmarks. But overall I like the "idx reuse" strategy the most because it can't as easily suffer clobbering. Just generate all the vertices first and then the indexes. It's almost a drop-in replacement even for the much more complicated Thank you for taking a look at almost a 3 year idle PR! |
8b83e0a
to
d735066
Compare
c817acb
to
8d39063
Compare
I profiled it with Tracy. Profiler instrumentation was added only around two loops that use This trace is original code. Unsure how to implement total time metrics, maybe they are different because profiling sessions were not exactly equal, but average execution times are nearly same. |
commit 5884219 Author: cfillion <[email protected]> Date: Wed Sep 28 23:37:39 2022 -0400 imgui_freetype: Assert if bitmap size exceed chunk size to avoid buffer overflow. (ocornut#5731) commit f2a522d Author: ocornut <[email protected]> Date: Fri Sep 30 15:43:27 2022 +0200 ImDrawList: Not using alloca() anymore, lift single polygon size limits. (ocornut#5704, ocornut#1811) commit cc5058e Author: ocornut <[email protected]> Date: Thu Sep 29 21:59:32 2022 +0200 IO: Filter duplicate input events during the AddXXX() calls. (ocornut#5599, ocornut#4921) commit fac8295 Author: ocornut <[email protected]> Date: Thu Sep 29 21:31:36 2022 +0200 IO: remove ImGuiInputEvent::IgnoredAsSame (revert part of 839c310), will filter earlier in next commit. (ocornut#5599) Making it a separate commit as this leads to much indentation change. commit 9e7f460 Author: ocornut <[email protected]> Date: Thu Sep 29 20:42:45 2022 +0200 Fixed GetKeyName() for ImGuiMod_XXX values, made invalid MousePos display in log nicer. (ocornut#4921, ocornut#456) Amend fd408c9 commit 0749453 Author: ocornut <[email protected]> Date: Thu Sep 29 19:51:54 2022 +0200 Menus, Nav: Fixed not being able to close a menu with Left arrow when parent is not a popup. (ocornut#5730) commit 9f6aae3 Author: ocornut <[email protected]> Date: Thu Sep 29 19:48:27 2022 +0200 Nav: Fixed race condition pressing Esc during popup opening frame causing crash. commit bd2355a Author: ocornut <[email protected]> Date: Thu Sep 29 19:25:26 2022 +0200 Menus, Nav: Fixed using left/right navigation when appending to an existing menu (multiple BeginMenu() call with same names). (ocornut#1207) commit 3532ed1 Author: ocornut <[email protected]> Date: Thu Sep 29 18:07:35 2022 +0200 Menus, Nav: Fixed keyboard/gamepad navigation occasionally erroneously landing on menu-item in parent when the parent is not a popup. (ocornut#5730) Replace BeginMenu/MenuItem swapping g.NavWindow with a more adequate ImGuiItemFlags_NoWindowHoverableCheck. Expecting more subtle issues to stem from this. Note that NoWindowHoverableCheck is not supported by IsItemHovered() but then IsItemHovered() on BeginMenu() never worked: fix should be easy in BeginMenu() + add test is IsItemHovered(), will do later commit d5d7050 Author: ocornut <[email protected]> Date: Thu Sep 29 17:26:52 2022 +0200 Various comments As it turns out, functions like IsItemHovered() won't work on an open BeginMenu() because LastItemData is overriden by BeginPopup(). Probably an easy fix. commit e74a50f Author: Andrew D. Zonenberg <[email protected]> Date: Wed Sep 28 08:19:34 2022 -0700 Added GetGlyphRangesGreek() helper for Greek & Coptic glyph range. (ocornut#5676, ocornut#5727) commit d17627b Author: ocornut <[email protected]> Date: Wed Sep 28 17:38:41 2022 +0200 InputText: leave state->Flags uncleared for the purpose of backends emitting an on-screen keyboard for passwords. (ocornut#5724) commit 0a7054c Author: ocornut <[email protected]> Date: Wed Sep 28 17:00:45 2022 +0200 Backends: Win32: Convert WM_CHAR values with MultiByteToWideChar() when window class was registered as MBCS (not Unicode). (ocornut#5725, ocornut#1807, ocornut#471, ocornut#2815, ocornut#1060) commit a229a7f Author: ocornut <[email protected]> Date: Wed Sep 28 16:57:09 2022 +0200 Examples: Win32: Always use RegisterClassW() to ensure windows are Unicode. (ocornut#5725) commit e0330c1 Author: ocornut <[email protected]> Date: Wed Sep 28 14:54:38 2022 +0200 Fonts, Text: Fixed wrapped-text not doing a fast-forward on lines above the clipping region. (ocornut#5720) which would result in an abnormal number of vertices created. commit 4d4889b Author: ocornut <[email protected]> Date: Wed Sep 28 12:42:06 2022 +0200 Refactor CalcWordWrapPositionA() to take on the responsability of minimum character display. Add CalcWordWrapNextLineStartA(), simplify caller code. Should be no-op but incrementing IMGUI_VERSION_NUM just in case. Preparing for ocornut#5720 commit 5c4426c Author: ocornut <[email protected]> Date: Wed Sep 28 12:22:34 2022 +0200 Demo: Fixed Log & Console from losing scrolling position with Auto-Scroll when child is clipped. (ocornut#5721) commit 12c0246 Author: ocornut <[email protected]> Date: Wed Sep 28 12:07:43 2022 +0200 Removed support for 1.42-era IMGUI_DISABLE_INCLUDE_IMCONFIG_H / IMGUI_INCLUDE_IMCONFIG_H. (ocornut#255) commit 73efcec Author: ocornut <[email protected]> Date: Tue Sep 27 22:21:47 2022 +0200 Examples: disable GL related warnings on Mac + amend to ignore list. commit a725db1 Author: ocornut <[email protected]> Date: Tue Sep 27 18:47:20 2022 +0200 Comments for flags discoverability + add to debug log (ocornut#3795, ocornut#4559) commit 325299f Author: ocornut <[email protected]> Date: Wed Sep 14 15:46:27 2022 +0200 Backends: OpenGL: Add ability to #define IMGUI_IMPL_OPENGL_DEBUG. (ocornut#4468, ocornut#4825, ocornut#4832, ocornut#5127, ocornut#5655, ocornut#5709) commit 56c3eae Author: ocornut <[email protected]> Date: Tue Sep 27 14:24:21 2022 +0200 ImDrawList: asserting on incorrect value for CurveTessellationTol (ocornut#5713) commit 04316bd Author: ocornut <[email protected]> Date: Mon Sep 26 16:32:09 2022 +0200 ColorEdit3: fixed id collision leading to an assertion. (ocornut#5707) commit c261dac Author: ocornut <[email protected]> Date: Mon Sep 26 14:50:46 2022 +0200 Demo: moved ShowUserGuide() lower in the file, to make main demo entry point more visible + fix using IMGUI_DEBUG_LOG() macros in if/else. commit 51bbc70 Author: ocornut <[email protected]> Date: Mon Sep 26 14:44:26 2022 +0200 Backends: SDL: Disable SDL 2.0.22 new "auto capture" which prevents drag and drop across windows, and don't capture mouse when drag and dropping. (ocornut#5710) commit 7a9045d Author: ocornut <[email protected]> Date: Mon Sep 26 11:55:07 2022 +0200 Backends: WGPU: removed Emscripten version check (currently failing on CI, ensure why, and tbh its redundant/unnecessary with changes of wgpu api nowadays) commit 83a0030 Author: ocornut <[email protected]> Date: Mon Sep 26 10:33:44 2022 +0200 Added ImGuiMod_Shortcut which is ImGuiMod_Super on Mac and ImGuiMod_Ctrl otherwise. (ocornut#456) commit fd408c9 Author: ocornut <[email protected]> Date: Thu Sep 22 18:58:33 2022 +0200 Renamed and merged keyboard modifiers key enums and flags into a same set:. ImGuiKey_ModXXX -> ImGuiMod_XXX and ImGuiModFlags_XXX -> ImGuiMod_XXX. (ocornut#4921, ocornut#456) Changed signature of GetKeyChordName() to use ImGuiKeyChord. Additionally SetActiveIdUsingAllKeyboardKeys() doesn't set ImGuiKey_ModXXX but we never need/use those and the system will be changed in upcoming commits. # Conflicts: # docs/CHANGELOG.txt # imgui.h # imgui_demo.cpp
I was curious to have another look at this today, but it has become very confusing to follow because:
As of today if I look at the leftover diff, the only difference becomes: -- ImVec2* temp_normals = (ImVec2*)alloca(points_count * sizeof(ImVec2))
++ ImVec2* temp_normals = (ImVec2*)((size_t)(_VtxWritePtr + vtx_count) & ~(alignof(ImVec2) - 1)) - points_count; But this doesn't apply the same to current code. Considering the last posts from from April 2022 suggested no measurable difference I think we can close this. |
Math stays the same, only other change is the early exit, to prevent reading a zero sized array when initializing
n0
.When i have it in my head: could it also auto detect winding order?
Currently it assumes clockwise winding, which is ok, but isn't documented anywhere.
Long overdue update:
Firt i want to say that your code has pretty much optimal performace, as in number of operations and throughput. That's why my first proposal failed abysmally, it both needed to compute one more sqrt, which matters when half of the inputs have 3 or 4 points, and was painfully serial.
So an intermediate buffer is necassary. But what about alloca? Well the key realization is that while another buffer is needed, more memory is not. That's what i did last year, re-use the vertex buffer to also store the normals. What i didn't do, and which prevented me from matching the original performace, was this:
ImVec2* temp_normals = (ImVec2*)((size_t)(_VtxWritePtr + vtx_count) & ~(alignof(ImVec2) - 1)) - points_count;
Which is again reusing the vertex buffer, but now it's nice, transparent and tightly packed.
My question to you is: "Should ImGui do this type of memory aliasing?".
(and, ugh, how to do it in a portable way - just say that ImVec2 is always 4 byte aligned?)
I also tried reusing the vertex buffer in the same way, and just inlining all the vector operators in the original.
Performance testing:
Using example_null as a base, a loop of NewFrame, 3x ShowDemoWindow, Render. Additonaly ImDrawList::AddPolyline and ImFont::RenderText are stubbed out to amplify the effect of AddConvexPolyFilled. The loop runs until 5 seconds have passed (batched to avoid the overhead of getting current time). Both Visual Studio 2017 and 2015 are tested (v141 and v140)
Signed,
an annoying GitHub user.