-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: main thread is blocked when user resizes or moves a window #1059
Comments
Fix it already its been 8 years! |
Making the executive decision to close this bug as wontfix; this isn't worth all the known problems and unknown risks that fixing it would cause. |
How is there risk to making it so you can drag a window without it pausing
a program?
…On Wed, 17 Feb 2021 at 04:51, Ryan C. Gordon ***@***.***> wrote:
Making the executive decision to close this bug as wontfix; this isn't
worth all the known problems and unknown risks that fixing it would cause.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1059 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK7ZOSQU7OBQU36OGZYQGLLS7NDM3ANCNFSM4XN2Q45A>
.
|
So the fundamental issue is that the way SDL gives you events is fundamentally at odds with how Win32 wants your program to handle events. What you're supposed to do (according to Win32) is have a "window procedure" (callback) which runs for each event. SDL provides this callback for you, but the SDL callback just records events into the event queue for you to respond to later. One of the events that you're supposed to respond to during your callback is is a repaint event. SDL can't repaint for you but usually this isn't an issue because sometime shortly after SDL puts all the events in the queue then you grab events from the queue and repaint yourself. The problem is that while the user is holding the mouse button down during a resize of the window, control never returns to the main program. The user32 events loop will just hold on to your program's control flow and continually call your window procedure, giving you resize related events and paint events. If you respond to the paint events by painting immediately within the window procedure then you'll get a program that behaves "properly" during a resizing. However, this runs totally counter to SDL's event queue system. The only way to fix this is to entirely replace one of SDL's core components. In other words, the |
This thread lists multiple problems and potential future incompatibilities. |
However, you're welcome to use the attached patch in your code, if you're comfortable with the drawbacks. |
Thanks for the honest answers everyone. sorry i was rude.
…On Thu, 18 Feb 2021 at 14:15, Sam Lantinga ***@***.***> wrote:
However, you're welcome to use the attached patch in your code, if you're
comfortable with the drawbacks.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1059 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AK7ZOSWX3RQVXCR4E3XAMB3S7UOIJANCNFSM4XN2Q45A>
.
|
There needs to be some kind of SDL hint, or something along those lines to fix this behavior, because this is making SDL2 borderline unusable for games that have lockstep netcode. (one person decides to drag or resize their window, and the whole session dies, pissing off all of the players trying to play. YAY!) One shouldn't need to rely on a patch that is old and insanely hard to find (I've been searching for such a thing for weeks, only found this now.) just to get past such an obvious and horrid issue. Not to mention said patch likely can't even be merged with current SDL2 anymore due to its age. Been struggling with this problem for years over multiple projects, and I'm tired, frustrated, and fucking desperate for something, anything, that can remedy it. |
What happens to the other players when someone unplugs their network cable in this scenario? |
@icculus Unplugging your network cable as the host of a multiplayer game would indeed disconnect everyone else playing the game (If you're not the host, as with a client/server model, it would at minimum disconnect yourself). But that's completely expected by the user who unplugged their network cable, both the client/server and the p2p results of that action are well-understood by the user and by the game dev, and as game devs we can add a message like "The host has disconnected" which the users would be able to figure out in a crystal clear manner that because Robert unplugged his network cable, and Robert was (presumably) the host in the p2p game, everyone got disconnected. E.g., No bug tickets for us the dev team, because the users fully understood exactly what happened. Having everyone (or even just yourself) disconnect just because you dragged a window is very subtle and frustrating, and it would be difficult for the user to even realize that it was the dragging that caused the issue, as opposed to just thinking your application is sucky. It took me as the dev countless hours of debugging to realize that the reason why my application client was disconnecting from the server every once and a while was because I was dragging the window, dragging the window just isn't something that I interpret as an action that could affect my application, it's just a subconscious thing I do to ensure that things are placed well. Additionally, for me, dragging the window only disconnected client from server like 1/4th of the time which makes it even harder to make that association, it just looked like a completely random bug that we couldn't figure out how to reproduce consistently for the longest time, but made the application somewhat annoying to use for long periods of time, and its not like our users ever reported that they were dragging the window when it happened, they had no idea how to replicate it either, it just happened randomly from their point of view. Once we figured out the association it wasn't hard to find this github issue, but something better can be done here. vvvvvvvvvvvvvvvvvvvvvvv Imo, at the absolute minimum, the documentation of SDL_PollEvent desperately needs to say that it will block if the user drags or resizes the window on the Windows OS. Then at least developers can work around the issue and maintain network connections on another thread without it being an unnecessarily large refactor after the fact [as it was for us]. ^^^^^^^^^^^^^^^^^^^^ |
^ This. Very much this. One player (not even the host) was dragging their window in a match I had and everyone was confused as to why everyone was suddenly lagging. (To be specific, I'm working on a netplay-centric port of Duke Nukem 3D, which uses a master/slave lockstep form of networking, so if ANYONE so much as sneezes on their window, it'll hang the whole match until the operation is done, and add a bunch of persistent lag over the next minute or two as the input lag buffer gets inflated to hell and back to compensate.) Wouldn't be the first time this has happened, either. Adding to my frustration and abrasive demeanour right now is getting blamed for it and/or being told my port sucks because of something out of my control.
The entire game hangs for everyone, and they have to quit. Doesn't matter if it's the host or a client. (The unfortunate downside to lockstep netcode) |
Would it be possible to set a custom WindowProc function on the window that receives the WM_MOVE, etc. and handles whatever updates need to be done application side before passing the events off to SDL's WindowProc? |
@TerminX See https://stackoverflow.com/questions/32294913/getting-contiunous-window-resize-event-in-sdl-2 for something of this sort which uses |
I wrote one of the first UDP implementations for Duke3D back in the day, so I totally get this. But the fragility of Duke's system is going to bite you sooner or later, window dragging or not. The extremely non-trivial but correct approach would be to replace that netcode with something more robust...but dear lord, that would be a painful effort. Some other approaches to try:
|
Thankfully I've spent a few years at this point refactoring the whole thing, it's in a much better state than the old days. Basically impossible to go out of sync now unless someone makes a mod with faulty behaviour like making RNG calls during display events. If the network is suffering packet loss, or extreme latency, it just waits before advancing (however, if there's a full connection loss, it'll stay waiting forever, but menus and stuff still work. This is the case right now if someone drags their window for too long), unlike DOS Duke which often would just have a massive hernia and then continue while remaining out of sync, fully locking up once you attempt to quit or start a new game. Prediction code is also in the process of being completely overhauled. The plan is to implement a full rollback system and in-game joining at some point, as well. Failing that, I do have a WIP client/server branch which is partially functional, but buggy as shit simply due to how Duke3D was designed. Just, the only major problem I'm suffering with now is window events. Hoping perhaps with these functions listed, I can figure something out. Thanks. |
@slouken I read above discussions & the patch. My apologies if I got it wrong, but here are my takeaways: Why the patch looks not too terribly useful: from what I can tell from the comments, the patch completely replaces the regular resizing with a "manual" one that breaks default desktop handling like window snapping. (Is that correct?) To me, that sounds like a fundamentally not useful approach. at all. I also think that the redraw issue really is the secondary problem here, so I don't see the point in getting stuck on that one if it's so hard, so the patch seems like a dead end. What I would suggest instead: why can't we have a "let me do non-UI app processing" callback that is guaranteed to still be on the main thread, but is banned from calling any SDL2 event/draw functions? This way one can do a nested call to e.g. netcode or audio or physics updates to keep things running while just skipping drawing & input processing. I think this would fix the pressing issue of total functionality drop-outs like netcode desync, internet connection losses, complete cutscene audio desync, ... while hopefully being way more feasible for SDL2 to provide? The original issue title talks about the blocked main thread after all, and I agree that's the way bigger problem, especially for multiplayer. Edit: additional note: it would also most likely be way, way easier for many code bases to make use of such a callback if it is still on the main thread, than try to make their entire gameplay happen on a separate thread. It's just a different magnitude of headaches. So while it might seem like not much to work with, it could really help this situation massively. Edit2: #1059 (comment) this also sounds very alike to what I am suggesting. I'd just prefer a proper, documented solution. It can still be marked as experimental. What about In conclusion, I don't see much value in testing the patch. But is such a callback maybe more feasible? If yes, could this issue be reopened to reconsider that? It won't fix the redraw, but I really think the discussion got too sidetracked on that. |
You're welcome to create a callback approach, but please create a new issue and/or pull request for that, since it's fundamentally different from this one. |
Are you worried about other platforms? This issue only deals with Windows, but similar things can happen for other platforms. For example on macOS if you click-and-hold on the close, minimize, or maximize window buttons, or open any of the app's menu bar tabs, the OS won't return from its event poll until that's done. I don't know what a cross-platform 'solution' to event-thread-blocking would be (if one even exists) aside from restructuring your code to not have timing-critical things run on the only thread that has arbitrary blocking due to user and OS interaction, but if one exists I think it'd make more sense to discuss it in a cross-platform context rather than in a Windows issue thread. |
@slime73 I was simply unaware of that, since Linux doesn't seem to have any comparable issues, and I only have test environments for Windows and Linux. However:
I think from the SDL2 API side this is trivial. Just name it
In my opinion this is not as a necessarily "brilliant" design as some make it to be, so let's just agree to disagree here. I think many others would see it like me. And this can often be solved too, by sticking with libraries that respect this problem better, instead of just hand-waving with "uh, throw threads at it or something." (Granted, SDL2 usually does respect this well outside of these few corner cases.) I could discuss this for a long time, but maybe can we just work under the premise that it's useful if people aren't forced to work around this with threads? |
(I just want to reiterate that any program that can't deal with the process being starved of CPU time is fundamentally broken no matter what we do or do not do with window resizing. If you replace "user is resizing the window" with "daily virus scanner started running and nothing is moving quickly now" or "system ran out of memory and started swapping heavily to disk" you still have a bug in your program if the audio goes out of sync or network connections drop, etc.) |
@icculus I don't understand. At face value your comment just seems irrelevant to me. Any networkied action game will fundamentally drop out of the session if the entire PC hangs... so, huh? I am really surprised I even need to go into this, since SDL2 seems to encourage a less-threads-is-better design in general, so why is my request apparently so weird? How in particular is it strange to want to not make the game misbehave and drop out just when I resize the window? Yes, disk I/O should be loading screen only, or in threads. (Or non-blocking I/O! Threads are not always the only answer.) And yes, you can thread game logic and netcode, too. Should you? Should you just to make resizing not break everything massively? How is this scenario so contentious all of a sudden? I'm legit stumped. So to get back to the issue, would it be possible to add a "let me do non-UI things on the main thread while the OS blocks the window" to SDL2? I find it really hard to believe it's just me finding that useful, even if I just scroll to previous comments. I don't understand this discussion. I don't understand either why "you HAVE to use threads" is an acceptable answer. |
It seems very odd that this kind of industry leading library is unable to let my code run when the window is dragged. For like a decade, from what I'm reading here? Just don't render anything, let all SDL code fail horribly, anything, but for FSM sake, don't block my code! |
SDL doesn't know ahead of time that you're entering the message pump to be dragged for an indeterminate amount of time. This is a limitation of the design of the SDL_Event loop interacting with the Windows event loop. There are many workarounds, but they'd need to be implemented and all require changes on the part of the app. Perhaps the design could be revisited in SDL3 to not interact poorly, but I'm not sure what it would look like. |
I am sure the techicalities of why this is a problem are sound, and I am sure it's the usual microsoft thing that's causing it. However, the consequences are absolutely terrible. All I'm saying, all my criticism is regarding priorities. I'm sure music visualizations and things like that are loving it. If they are smart, they'll probably let the music go bwbwbwbwbw until you let the window go. Heck, I'm going mad just having my avg time measures f'ed up for seconds when I have to move the window out of the way of the console after starting all the time. Btw. all of that needs to be combined with all I've read how you must not move the rendering or the event loop to different threads. All of this appears pretty extreme to me. Which is of course measured by the standing SDL seems to have. It's not like I would complain about some dude's engine that way. Anyway, cheerio everyone. Just felt that this whole thing needed quite the kick in the behind. |
There is a reason dragging/resizing the window or using the menus blocks execution of your SDL application. The way Windows handles those interactions, and always has handled it going back to Windows 1.x even, is that DefWindowProc() goes into it's own event handling loop to handle that action. This of course blocks the SDL event handling loop. The way DOSBox-X handles it is by modding the SDL library to maintain both a parent top level window and a child window inside, and then a separate thread handles message handling. If DefWindowProc() blocks for window size/move and menu interaction, then that thread is blocked while the main SDL application continues to run unimpeded. Perhaps official SDL development can handle it differently or possibly cleaner, but that's how you can avoid the blocking issue entirely. |
Appreciate the help, but really I'm not using a cross platform thingy that "is mainly used to handle cross platform window management" to work around window management tailored to specific platforms. The only solution that works is that SDL is just able to move a running program across the screen, even on an outlandish platform like windows. |
We're still discussing what the appropriate way to work around this Windows limitation should be for SDL3, which is why this issue is still open. While we discuss that, I'm going to lock this thread, as I think we have enough feedback telling us that people feel strongly about finding a resolution. |
I've added a solution that dovetails nicely with the new main callbacks in SDL 3.0 and if you're not using that you can set an event watcher to handle expose events and draw then. Thanks for all the feedback! |
It turns out that the workaround only worked for MacOS. Refs Genymobile#3458 <Genymobile#3458> Refs SDL/Genymobile#1059 <libsdl-org/SDL#1059>
Being angry because you are completely ignorant on a topic doesn't help you or anyone else. This issue is indeed related to Win32 modal loops, and also applies if you use Win32 directly. A little search on the internet, or just reading this conversation, would have told you about that. Moreover, this problem is solved now, so I don't see the point of your intervention. Anyway, props to the SDL team for your amazing work on this library, you don't deserve such rude comments. |
@RT2Code It should be the SDL team that apologizes for the rude comments themselves. If you make a multiplatform library like this, it is your responsibility to make sure that your event loop interacts correctly with ALL of the platforms. SDL had this issue for several years and they kept brushing it off, as many of the other people in this conversation have pointed out. @icculus 's comments above, berating people because they expect this library to not block on window resizes (or even holding down one of the window buttons on macOS), and even trying to compare this to someone unplugging an ethernet cable, was extremely disappointing and completely uncalled for. And no, the problem isn't solved. It still exists in SDL2. We still have to use a workaround for SDL2.
Many people have solved this problem easily. It only took the SDL team many years to fix it. |
I didn't berate people, I offered several possible technical workarounds, and I locked this thread because it keeps generating unhelpful commentary like this, which is also why I'm locking it again now. |
This is fixed for the SDL 2.30 release, in 509c70c |
This bug report was migrated from our old Bugzilla tracker.
These attachments are available in the static archive:
patch (SDL_modeless_size_move.patch, text/plain, 2014-02-06 05:21:48 +0000, 11407 bytes)Reported in version: HG 2.1
Reported for operating system, platform: Windows (All), All
Comments on the original bug report:
On 2013-08-30 01:00:19 +0000, wrote:
On 2014-01-20 05:04:25 +0000, Nathaniel Fries wrote:
On 2014-01-20 16:48:16 +0000, Nathaniel Fries wrote:
On 2014-01-20 16:54:40 +0000, Nathaniel Fries wrote:
On 2014-02-06 05:21:48 +0000, Nathaniel Fries wrote:
On 2014-02-09 10:08:44 +0000, Sam Lantinga wrote:
On 2014-02-09 12:43:34 +0000, Nathaniel Fries wrote:
On 2014-02-09 20:36:01 +0000, Sam Lantinga wrote:
On 2014-02-09 23:31:57 +0000, Nathaniel Fries wrote:
On 2014-02-20 21:05:23 +0000, Nathaniel Fries wrote:
On 2014-02-25 12:53:59 +0000, Sam Lantinga wrote:
On 2014-03-05 11:12:47 +0000, Andreas Ertelt wrote:
On 2014-03-06 10:06:56 +0000, Andreas Ertelt wrote:
On 2014-03-09 01:08:36 +0000, Nathaniel Fries wrote:
On 2014-03-11 07:11:10 +0000, Andreas Ertelt wrote:
On 2014-03-12 19:09:40 +0000, Nathaniel Fries wrote:
On 2014-03-12 20:40:33 +0000, Nathaniel Fries wrote:
On 2014-03-16 09:42:03 +0000, Nathaniel Fries wrote:
On 2019-12-07 17:00:27 +0000, Jake Del Mastro wrote:
On 2020-03-24 21:13:36 +0000, Ryan C. Gordon wrote:
On 2020-04-16 16:50:42 +0000, Ron Aaron wrote:
On 2020-04-16 19:19:48 +0000, Andreas Ertelt wrote:
On 2020-04-18 13:08:58 +0000, Andreas Ertelt wrote:
On 2020-07-12 09:53:51 +0000, Jack C wrote:
The text was updated successfully, but these errors were encountered: