-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "video: Prefer Wayland over X11 (take 2!)" #9345
Conversation
23f9898
to
d7b63cc
Compare
If we do this, we are basically accepting these issues are unfixable for the next ten years (SDL4). Having this as the default in SDL3 (which isn't being used yet!) is important for signaling to other stakeholders that we actually do need to get solutions in place for detected issues. |
We are absolutely accepting they are unfixable right now yes, because they are -- but probably not for the next ten years. We can flip the default later in SDL3's lifecycle, or even based on whether fifo-v1 and commit-timing are exposed in the wl_registry.
SDL is not your tool for "signaling to stakeholders" about what is important. It's an actual library used by real developers and users! You are suggesting we do those users and developers a disservice by forcing an fundamentally broken default onto them purely for your own ideological reasons. It's really important as a developer is to be pragmatic and separate your own desires from what actually benefits your customers/users. |
|
Should this be an issue report (talking about the specific problems, and options for what to do about them) rather than a pull request (already picking a specific option seemingly without prior discussion from people involved in this)...? |
IMHO it wouldn't make much sense as the issue exists entirely outside of SDL, it has to do with how Vulkan and GL calls behave when running under native Wayland and since, at least in the case of Vulkan, those are calls are made by the user directly I really doubt there can be a different solution. In other words this is a Wayland WSI issue and not an SDL issue. |
Given that we just released the SDL 3.0 preview, I'm leaning towards making sure gamers and developers have a good experience. We can change this default again in the future. @Joshua-Ashton, can you add a code comment briefly describing the issues and the criteria for making Wayland the default again? @icculus, @Kontrabant, do you have any concerns before we merge this? |
It makes as much sense (or more) than a SDL pull request, I think. :)
I haven't heard anyone weigh in on how long it may take for those issues to be resolved in Wayland and in drivers, and whether their timelines would miss SDL3's full release or not.
Personally I'm looking forward to improved DPI scale support. |
Aside from the FIFO wayland protocol (which can be implemented whenever anyone wants), I don't have any references of upstream discussions related to these problems which point to the FIFO protocol or similar things. Personally, I didn't have any idea the protocol was so important until today. |
@Joshua-Ashton I understand your opinion and conclusion, but given that SDL3 has not yet been released, do you think any decision on this could be deferred until very shortly before the release? I am coming from a different angle, of scientific applications and vendors of proprietary apps, and many do not want or can not support Wayland due to its limitations. And that will never change unless someone makes some noise about it, so I have a few protocols proposed to fix the raised issues. One annoying thing is that the affected app vendors do not want to engage with Wayland protocol discussions, so there's no weight or urgency behind the protocol additions. So, maybe this decision can be deferred for now and instead the SDL project could throw its weight behind the respective Wayland protocols so they get implemented faster? If nothing happens in the required timeframe and if at SDL3's scheduled release time the issues still persist, you can for sure switch back to X11. But then the project has tried its best to collaborate and fix issues at the source, rather than papering over them too early. |
This reverts commit f9f7db4 Wayland has a myriad of unresolved problems regarding surface suspension blocking presentation and the FIFO (vsync) implementation being fundamentally broken leading to reduced GPU-bound performance. That is not to say "we should fix FIFO in Mesa/other drivers," but rather that it is completely unfixable without an additional protocol, in this case fifo-v1[1]. Without this protocol, vkQueuePresent or glSwapBuffers *must* stall for the 'frame' callback after presenting an image. The only reason we can get away with this on SteamOS is because Gamescope implements what is essentially fifo-v1 and we use that there. The other side is surface suspension -- a very similar issue to the above wrt the frame callback being used in that way and blocking. If the SDL window is obscured, vkQueuePresent *will* block in FIFO, which games typically do NOT like. This is solved by the combination of fifo-v1 and commit-timing-v1. There is no advantage to games and average applications preferring Wayland over X11 -- only severe performance and unusability regressions right now. Thus, we must revert this change until fifo-v1 and commit-timing-v1 are released and at least in a stable release for major compositors. [1]: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/256
d7b63cc
to
fca4099
Compare
Rolling back just means you're running through XWayland instead, which has the same underlying issues with things like throttling occluded windows, and loses things like scaled framebuffer support vs running natively. The only real way to completely avoid this is to run a native X session until the issues are addressed. |
Done.
Most of these discussions re. fifo have just been Mesa developers speaking together over IRC or elsewhere out-of-band. I've been making a big deal out of at least surface suspension for multiple years now, but Wayland progress is slow. :/ At least we are getting somewhere now... I hope. :)
The reality of this will not change with time or before release. The right solution is for us to prefer Wayland only if fifo-v1 + commit-timing is available -- otherwise default to X11. We can easily test the wl_registry for those, and have two Wayland_bootstrap entries, one for _preferred, one for _fallback (ie. if no X11) |
If this PR isn't the right solution, can you sketch one up that is (or will be)? |
XWayland doesn't have the issue wrt. fifo as the Vulkan X11 WSI implementation there can use a presentation thread and implement a commit queue on the client side. This is not possible in the Wayland WSI as it is ABI that QueuePresent = wl_surface_commit. Wrt. throttling, that does not happen on XWayland or when using an X11 graphical compositor as every window there becomes If you are using X11 without a compositor, you will be throttled to 1Hz when your window is occluded, however if you are using a native Wayland window and are occluded, you will never be able to make forward progress at all, and your application will stall forever. Wayland WSI is actually violating the Vulkan forward progress guarantees for presentation right now because of this. |
It's the right solution until those protocols land in upstream wayland-protocols. |
This is probably why things drag on for so long. If all the discussions are out-of-band in a way that other stakeholders (particularly integrators, desktop developers, etc.) have no visibility, then there will be no movement. (Yes, I am aware of some issues with some stakeholders, but generally most of us didn't know anything until just now.)
What is the timeline for the SDL 3 release? |
That depends largely on the feedback on the Preview Release. Realistically not shorter than 3 months. We still have over 300 issues in the release milestone. |
Could we hold this for a little while then while we try to get things sorted out in the various upstreams? |
The surface suspension discussion happened partly publicly here https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/99 |
I must mention here that this statement is really not true. Going through Xwayland means wrong behavior with fractional scaling - with most compositors that means reduced resolution in the game, but even on KWin it causes problems with multiple monitors that have different scales (on displays with a lower scale you get an increased resolution in the game).
Xwayland can do only 8 bits per color as well, which is a minor benefit for going with native Wayland. It's also worth noting that Xwayland is not completely immune to the surface suspension issue either, it throttles windows that don't get frame callbacks to 1Hz. There are significant problems with running natively on Wayland, but please let's not pretend Xwayland is great either. There are benefits and downsides to both choices, and they need to be weighed against each other, it's not just an obvious choice for one side. |
Even assuming the protocol was somehow merged and also implemented by enough compositors in just 3 months (which is pretty much impossible) I don't think this could reasonably reach stable releases in 3 months? This issue is very well known AFAIK among those who work around the area. |
I'm a little confused right now from your messages, I think this sums things up if I got it right?
So I think having this as an issue discussion, and making that draft PR seem like good ideas. Unless I misread something? |
That's just hiding the problem I think, can you detect easily if the compositor has fifov1 |
|
Doesn't seem true for me on Plasma 6 Wayland, labwc and Gnome -- I also don't see anywhere in xserver that'd do that. Everything is |
Running testsprite (with --vsync) on Gnome + XWayland, it throttles to 1hz for me when the window is occluded with both the GL and Vulkan renderers. |
This is also not true -- this only applies to child windows that do not fill the whole window, and applications using XCopyRect, or other XComposite extensions. Games and most apps will not get an extra copy, as they fill the whole window, or have a child window that fills the whole window.
Sure, but that's an added bonus really. That's actually one of the main reasons the Gamescope WSI layer exists. The other regressions are pretty fundamental and pretty bad. 10-bit color for your game isn't really useful if the GPU-bound performance is totally frogged frame-pacing wise :D |
So that's the main concern right? Not so much the eventual support by compositors? If the protocol is sorted out that you can reliably check for it's support at runtime, you can prefer wayland, otherwise go with X? |
Yes, that is what I am proposing for us to add when the protocols are finalized. |
Call this a total shot in the dark, but: Did the Steam overlay ever get Wayland surface support added? I've not checked in ages but that would probably be good to have to make discussions on this more thorough; it's one thing if we have a complete end-to-end implementation of an all-Wayland system and give compositors as much information as possible in implementing the protocols, it's another if Steam just never implements for desktop or embedded and SDL chickens out without any real commitment from any involved party to fix this. We'd just be back at square one again, which I'm not really up for doing a third time in four years(!). If not the overlay, and if gamescope does indeed do this right, maybe it would make sense to default to Wayland when gamescope is running, and if it can expose the protocols early, we can make it more robust to default when the protocols are exposed, as someone suggested earlier. As long as something is happening that's fine, but until I see something in writing that says this is going to get done then it's hard for me to review this change in good faith, which for me is just an automatic "no". TL;DR: My customers get good results with this backend (even on Steam!); unilaterally sweeping others' effort under the rug as a first resort, without any consideration of other shareholders' experiences, does not sit well with me without some really good evidence that this isn't a lazy cop-out. |
Steam overlay doesn't work in wayland-native apps. |
No, it hooks Vulkan, etc just fine, but there is no input on the overlay and Steam Input is not functional. For GL apps, it does not work at all with Wayland. Sorting this out with Gamescope would be cool, using the same stuff we use on Deck for the overlay -- especially as we have the Wayland subsurface backend (which is not affected by the same issues referenced in this MR), but we are not at that point yet. |
Another missing feature in the Wayland stack is support for opening a game on the primary display, afaik. |
No, SDL3 does that if there is some means available to determine it. |
Yes, via kde_output_order_v1 - there is no generic way yet... |
i am just curious but doesn't https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/201 solve the surface suspend issue ? |
Please do NOT! go through with this PR! |
No, it basically lets applications know that they are occluded or otherwise in a state such that they can suspend rendering if they want to. The actual presentation throttling is done at the compositor level as frame callbacks stop being sent. |
And via d-bus display properties on Gnome. Not ideal I know, but until there is a better way… |
Is this about SDL users / game developers testing according to Steam requirements? I suppose relying on the SDL3 default for proper testing is not a very good idea. How about setting SDL_VIDEODRIVER and WAYLAND_DISPLAY instead? |
Respectfully, your average gamer/user does not care about the technical merits of Wayland vs X11. If running this natively with Wayland will, at the moment, deliver an objectively worse experience, i feel like SDL should run using XWayland by default until that is resolved and then switch back to Wayland by default when it is fully ready. IMO, it makes sense to merge this now. |
I'm not very familiar with Wayland, but if I understand correctly, the issue is that frames are being drawn only when the application is being polled by the frame callback? How does that enable VRR? With VRR it is the rendering application who should dictate the refresh rate by announcing that it has a frame ready and get it rendered ASAP, no? There's also a protocol enabling screen tearing which I would expect to send the frame to the monitor ASAP too... Or is the frame callback mechanism limited to v-synced applications? In that case, maybe rendering with the VRR\screen tear mechanisms instead of the v-sync polling mechanism fix these issues? |
@Conan-Kudo Just for context (FIFO issues), firstly we should waiting something like this |
Despite being a proponent of reverting Wayland-by-default last time, I'd really prefer to keep Wayland the default for 3.0, as I think there are some real advantages to making the transition at the same time as moving from SDL2→SDL3. One of the reasons this was backed out for 2.0.22 was that there were a number of SDL2 applications which only worked with X11 (due to directly calling Xlib/glX functions, requiring mouse warping, etc), and by switching the default at the same time that we break API compatibility, we won't have to worry about breaking existing applications. (Except via sdl2-compat, which already has a quirks mechanism in place for dealing with this sort of thing.) As I understand it, there are three (and-a-half) regressions compared to XWayland (please correct me if I am missing anything or otherwise wrong):
Equally, there are benefits to Wayland, particularly around scaling (including the framebuffer scaling features), HDR support, etc. So for some cases, Wayland can be the better option. Of course, these aren't regressions, and it does make sense to treat regressions as more important to new features (up to a point). Given that SDL3 hasn't been released yet, and so no-one should be using it yet, there isn't any particular rush to revert the default back to XWayland now. If we need to delay the SDL3 release a bit for some of these protocols to be implemented, we can have that discussion — I don't think there's the same pressure to upgrade from SDL2→SDL3 as there was from 1→2 with the fullscreen mode change issues. So defaulting to Wayland shouldn't have a negative impact on users — developers can keep using SDL2 until Wayland's ready. (And if they really want to use it early — and having to deal with games which shipped API-incompatible SDL 1.3 builds makes me hope that this is rare — they can patch the defaults with the library they ship.) But the reality is that distros, the steam runtime, etc do upgrade minor SDL versions behind the backs of games, so we really need some way of getting people to actually test with and support the Wayland backend from day 1 of SDL3 if we want to be able to change the default before SDL4. And setting the default now seems to me the nicest way to handle that. At the very least, let's keep Wayland the default for now, at least until we've finished locking down the API and are getting ready for release. |
@Joshua-Ashton Moving the blocking to ANI is unfixable without additional wayland protocols. Getting rid of indefinite blocks in present however is something we can fix and I did provide patches for it (and we can increase the minimum frequency to 30hz). I'd much rather we work around the missing protocol in mesa than reverting to X11 for SDL. |
No, the driver has no idea what an xdg_* anything is. |
@sulix There is no HDR support. The only HDR support exists with using Gamescope + the Gamescope layer which works with X11 windows. Gamescope has a frog_ protocol it talks to Plasma with in the nested case and does thr DRM HDR side itself in the embedded case. As mentioned before, we can just flip default to Wayland if the fifo-v1 + commit-timing protocols are available. |
I'm wondering if checking for fifo-v1 and commit-timing-v1 as triggers to revert to xwayland would even make sense considering there is no guarantee that these specific extensions will ever be accepted. |
This mesa MR works around the problem for the most part: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28419 Obviously having the protocols would be better but with the MR I don't think we need to prefer X11 over Wayland. |
Well, now all of us are aware that we need it, so that puts some urgency that we didn't really have before. That's often the problem with wayland-protocols: since merging them requires implementations to be ready first, writing code needs to be prioritized. |
Sorry — I missed that the KWin implementation was just the cut down frog_ protocol. But that does work with SDL's wayland backend with the hacky vulkan layer as documented on the KWin MR, right? Either way, I do agree that it's not ready now, so shouldn't be much of a consideration at the moment, and you'd know way more about it than me. :-)
My worry with that is that developers will test with an SDL3 build, it'll default to XWayland, and then when these protocols become available on users' machines, the exact same binaries will behave differently (e.g., mouse warping will just stop working). Maybe we can document that developers should absolutely also test with The safest option, IMO, is still to delay a final SDL3 release until Wayland is ready enough to be the default, at least on some setups. (In which case, maybe it makes sense to gate it on fifo-v1+commit-timing for people running older distros.) Whether a delay is a worthwhile tradeoff will, of course, depend on how long it takes for these to start landing (and how long all of the other things for SDL3 end up taking as well). In the meantime, I'd prefer we keep Wayland the default for these 'preview' releases, though, as it has more restrictions from a developer POV (mouse warping, windows don't show until a frame is renderered, etc), and in my experince an SDL game which works on Wayland is more likely to work fine on XWayland than vice-versa. |
So I guess these extensions are not controversial and there is some consensus towards them. If that's the case I'm relieved: some other extensions like the one to remember windows placement are in a much worse position. |
Yes, these aren't "messy" like those... 😅 There's still some protocol refinement being done, but that and implementations can be figured out in parallel. Then we can get acks from stakeholders once things are pinned down. |
Thanks for the feedback everyone! It sounds like the Wayland folks are aware of the issues and are making progress towards addressing them. I'll leave Wayland default for now while they are doing that, so it's easier for people to test and provide them feedback. We will re-evaluate this as we get closer to full release. @Joshua-Ashton, can you create individual issues that impact this and create a task list in the original merge request report, for tracking? I'm going to lock this conversation for now so we can focus on development. |
Since the performance issues motivating this PR are resolved by the combination of fifo-v1 and commit-timing-v1, let's check for those protocols before using Wayland by default. @Joshua-Ashton, I'll go ahead and close this PR. Can you open a new one implementing that? For those following along, this change guarantees that high performance Vulkan applications like games have the best experience by default. Hopefully these protocols will land soon, but either way the user has final control over the backend being used, via the SDL_VIDEO_DRIVER environment variable. |
Description
This reverts commit f9f7db4
Wayland has a myriad of unresolved problems regarding surface suspension blocking presentation and the FIFO (vsync) implementation being fundamentally broken leading to reduced GPU-bound performance.
That is not to say "we should fix FIFO in Mesa/other drivers," but rather that it is completely unfixable without an additional protocol, in this case fifo-v11.
Without this protocol, vkQueuePresent or glSwapBuffers must stall for the 'frame' callback after presenting an image. The only reason we can get away with this on SteamOS is because Gamescope implements what is essentially fifo-v1 and we use that there.
The other side is surface suspension -- a very similar issue to the above wrt the frame callback being used in that way and blocking. If the SDL window is obscured, vkQueuePresent will block in FIFO, which games typically do NOT like. This is solved by the combination of fifo-v1 and commit-timing-v1.
There is no advantage to games and average applications preferring Wayland over X11 -- only severe performance and unusability regressions right now.
Thus, we must revert this change until fifo-v1 and commit-timing-v1 are released and at least in a stable release for major compositors.