Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve safety around HasRawWindowHandle #71

Closed
maroider opened this issue Jun 16, 2021 · 29 comments
Closed

Improve safety around HasRawWindowHandle #71

maroider opened this issue Jun 16, 2021 · 29 comments
Labels
enhancement New feature or request

Comments

@maroider
Copy link
Member

maroider commented Jun 16, 2021

We got a bit sidetracked by this in #70, so I'm opening this issue to track it separately.

To summarize the problem briefly:
Letting users use windows with crates like wgpu without requiring the user to write any unsafe code is desirable.
TrustedWindowHandle can't really be used safely after it has been around for a bit. Unfortunately, there's no guarantee that the underlying window handle won't become invalid while a TrustedWindowHandle is alive.
gfx-rs/wgpu#1463 has a more in-depth outline of the issue at hand, and suggests that raw-window-handle should add the following impls of HasRawWindowHandle:

impl<'a, T: HasRawWindowHandle> HasRawWindowHandle for &'a T { /* .. */ }
impl<T: HasRawWindowHandle> HasRawWindowHandle for Rc<T> { /* .. */ }
impl<T: HasRawWindowHandle> HasRawWindowHandle for Arc<T> { /* .. */ }

The addition of these impls of HasRawWindowHandle makes TrustedWindowHandle seem kind of useless (for lack of a less biting word).

@maroider maroider added the enhancement New feature or request label Jun 16, 2021
@Lokathor
Copy link
Contributor

At a glance, I don't understand what the extra HasRawWindowHandle impls accomplish, because don't any of those values just deref to the contained type when necessary anyway?

@maroider
Copy link
Member Author

It's not explicitly documented as a guarantee of HasRawWindowHandle, but winit and sdl2's window structs destroy the underlying window when they're dropped.
wgpu wants to make sure that a wgpu::Surface is always valid, and as such they need to make sure the Window isn't destroyed while the Surface is alive, because there's no guarantee in the spec that a Vulkan surface will keep the underlying window handle valid.
Surface could just expose a method that allows you to access the underlying window, but that's also not the most convenient thing in the world.
The extra impls allow for convenient access to the underlying window type without having to go through such accessors. Glium is a particularly bad example here (even though Glium doesn't use raw-window-handle) in that you need to go through two layers of accessors to get at the winit::window::Window.

@maroider
Copy link
Member Author

maroider commented Jun 16, 2021

And I forgot to do this earlier, but cc @pythonesque @kvark

@Lokathor
Copy link
Contributor

Hm, well those are all reasonable impls to have, so I'm for it if that fixes things for wgpu.

@pythonesque
Copy link

Yep, that proposal still stands--having these implemented would be really helpful!

@pythonesque
Copy link

pythonesque commented Jun 17, 2021

It's not explicitly documented as a guarantee of HasRawWindowHandle, but winit and sdl2's window structs destroy the underlying window when they're dropped.

It's not documented, but from the existing requirements on the HasRawWindowHandle trait, we can infer something related: as long as the window isn't dropped, any implementor of HasRawWindowHandle has to keep any handle it handed out alive. So that's one of the main reasons for the proposal to work like that, it mean no new requirements for downstream implementors of the trait.

@Lokathor
Copy link
Contributor

One related thing: I heard that on Android the window's handle doesn't change but you're only allowed to use it at specific times. is that a concern here?

@pythonesque
Copy link

One related thing: I heard that on Android the window's handle doesn't change but you're only allowed to use it at specific times. is that a concern here?

If it is, then I think it needs to be reflected in other APIs (for stuff that actually use the window). But IIRC this is not an issue in practice on Android (can't remember the details).

@pythonesque
Copy link

pythonesque commented Jun 17, 2021

I realized this while responding to a comment from someone else, but I'm not sure the Arc<T> implementation is actually safe with the current unsafe trait implementation, thanks to Arc::make_mut and friends. I think for it to be safe, we'd need something like pinning, it might have to be done for like Arc<UnpinWrapper<T>>, with the HasRawWindowHandle API taking Pin<&Self> rather than self. Which is a little more annoying to use but I think in practice usable in exactly the same way in all the cases where this is safe. This also applies to Rc<T>. (Note that you can create an Arc or Rc as pinned which is the correct way to use this API).

Here UnpinWrapper would just prevent people from making the window Unpin.

@Lokathor
Copy link
Contributor

I mean if you're concerned about a window somehow altering its handle value through &mut self (which is already a violation of the unsafe contract), then don't forget that it can alter its handle value through &self as well.

@pythonesque
Copy link

pythonesque commented Jun 17, 2021

I mean if you're concerned about a window somehow altering its handle value through &mut self (which is already a violation of the unsafe contract), then don't forget that it can alter its handle value through &self as well.

No, I am worried about people swapping out the window for something else and then dropping the window, which is easy to do when you have &mut self and can't be prevented by any contract. Which is pretty much the whole reason Pin exists. OTOH invalidating the handle through mutations to &self violate the contract.

@pythonesque
Copy link

pythonesque commented Jun 17, 2021

Actually wait, hang on a second, I am dumb. Obviously if you were to do this you can't rely on the unsafe contract anymore, since it only applies as long as the original window is alive, so the pinning stuff probably isn't necessary. Or at the very least, it's more of a language lawyering issue than anything else (what does "is alive" mean).

That being said, the pin version of the API is definitely more obviously safe, since it guarantees that the window can't move after the value is called... I actually think with some slight changes to the HasRawWindowHandle contract, we could even avoid needing to specifically opt in Rc and Arc like this. But that has to be weighed against it being annoying to change the existing APIs.

@msiglreith
Copy link
Member

One related thing: I heard that on Android the window's handle doesn't change but you're only allowed to use it at specific times. is that a concern here?

To clarify, between suspend/resume the ANativeWindow which is returned as window handle may change (device dependent).

@Lokathor
Copy link
Contributor

That sounds very tricky to work around :/ wouldn't you have to throw out the old context and then refresh everything?

@pythonesque
Copy link

pythonesque commented Jun 17, 2021

One related thing: I heard that on Android the window's handle doesn't change but you're only allowed to use it at specific times. is that a concern here?

To clarify, between suspend/resume the ANativeWindow which is returned as window handle may change (device dependent).

I'm pretty sure this means HasRawWindowHandle just can't be implemented safely on Android, the contract it promises can't really be guaranteed. Or if it can be, it has to use some sort of indirection. Which is definitely annoying, but kind of a separate issue.

It might be beneficial to know how actual Android applications deal with that, because it sounds pretty unworkable in practice unless there are safepoints dictating where suspension can happen (possibly integrated into the runtime somehow) which can patch up the handles.

@msiglreith
Copy link
Member

msiglreith commented Jun 17, 2021

I'm pretty sure this means HasRawWindowHandle just can't be implemented safely on Android, the contract it promises can't really be guaranteed.

In one PR I weakend the guarentees for the window handle to account for Android. As it is in 0.3, winits implementation at least would is not be able to follow the docs.
Addition: https://github.com/rust-windowing/raw-window-handle/blob/master/src/lib.rs#L50.

It might be beneficial to know how actual Android applications deal with that, because it sounds pretty unworkable in practice unless there are safepoints dictating where suspension can happen (possibly integrated into the runtime somehow) which can patch up the handles.

Yes, the NativeActivity will use specific events/callbacks to notify the user regarding the surface lifetime. If you are interested, here are some details: https://source.android.com/devices/graphics/arch-sv-glsv

In winit these are basically just forwarded to the user as well:

  • ANativeActivity::onNativeWindowCreated -> ndk-glue::Event::WindowCreated -> winit::Event::Resumed
  • ANativeActivity::onNativeWindowDestroyed -> ndk-glue::Event::WindowDestroyed -> winit::Event::Suspended

@pythonesque
Copy link

pythonesque commented Jun 17, 2021

In one PR I weakend the guarentees for the window handle to account for Android. As it is in 0.3, winits implementation at least would is not be able to follow the docs.
Addition: https://github.com/rust-windowing/raw-window-handle/blob/master/src/lib.rs#L50.

Unfortunately this addendum pretty much makes the trait's guarantees completely useless since it doesn't specify what the system events are or how to handle them safely. I think this should be rethought somehow.

In winit these are basically just forwarded to the user as well:

  • ANativeActivity::onNativeWindowCreated -> ndk-glue::Event::WindowCreated -> winit::Event::Resumed
  • ANativeActivity::onNativeWindowDestroyed -> ndk-glue::Event::WindowDestroyed -> winit::Event::Suspended

This is definitely nowhere near good enough for a safe interface in itself though. Even assuming winit were to add some mechanism for handling the events itself by actively updating all the pointers to point to the new handle (and/or changed on Android to use indirection into the window or something), that's not enough if anyone in another thread can try to perform direct operations on a surface, since those updates wouldn't have to synchronize through the event system. Are you absolutely positive that old pointers to windows (e.g. those used in ongoing background operations) are invalidated by these events? If so, at what point are they invalidated--or is it just not safe to have pointers to the handle in those threads, period?

If the latter is the case, then the window itself (or the window handles) probably need to be !Send on Android, which would be super annoying for everyone on platforms that didn't have this property--and that's just scratching the surface of what's difficult here (indeed the docs seem to suggest just not to pass surfaces directly between threads). Additionally, after a window is destroyed, what do you do with subsequent operations that want to operate on a window? Ignore them? Block until the window comes back? Store them in a queue? Crash? None of it seems all that appealing (well, maybe you could make an argument for ignore), and I certainly wouldn't want to force every non-Android system have to explicitly deal with the possibility that the Surface vanished.

Ultimately I would suggest that rather than designing the interface around Android's limitations, which are severe, we look to see if there's any possible way we can make Android look more like other platforms, even at the cost of some performance. For example, maybe surface operations would require sending a message to a channel on the render thread on Android (or something). Or on Android, every request to a surface in another thread takes a read lock and calls to destroyed etc. block until all readers go away.

@Lokathor
Copy link
Contributor

Well a window is !Send on Win32, and also you can't alter the GUI from outside the main thread on Mac, so basically you should be assuming a window is !Send to begin with.

My inclination here is to make the unsafe contract be simply that "these values are truthful and they are usable right now", with no promise of the duration of usability. Of course this is not ideal, because then someone else has to step in and coordinate surface usage with system events. So if someone has a better idea I'm all ears. However, absent that, I'm not sure we can have a requirement that one of our core five OSes can't meet.

@pythonesque
Copy link

pythonesque commented Jun 17, 2021

Well a window is !Send on Win32, and also you can't alter the GUI from outside the main thread on Mac, so basically you should be assuming a window is !Send to begin with.

With the proposal I outlined (involving being generic over the window), anyone holding a !Send window would already be bound to the current thread automatically, so that's a good start, but winit would still need to be responsible for automatically updating handles when these events come in. But I think it's pretty obvious that currently, this is not the case--people happily hold onto raw window handles without caring about this! Adding something like an indirection mechanism directly within Android windows, together with a Cell and immediate handling of surface drop / resume messages to update the Cell value, would enable these APIs to work like people think they do.

My inclination here is to make the unsafe contract be simply that "these values are truthful and they are usable right now", with no promise of the duration of usability. Of course this is not ideal, because then someone else has to step in and coordinate surface usage with system events. So if someone has a better idea I'm all ears. However, absent that, I'm not sure we can have a requirement that one of our core five OSes can't meet.

In that case, all consumers of the trait right now (including winit and wgpu) can't actually safely hold onto window handles, so your proposal pretty much breaks all existing users of the trait--they would all then have to write their own custom OS-handling logic. I don't think that's preferable at all to finding a way for the Rust abstraction over Android window handles not to become invalidated. Not being able to rely on window handles living longer than their reference also basically means there's no point in having something like RawWindowHandle at all, since you'd just be passing it directly to some platform-specific API. It also means that all of these consumers would have to program super conservatively, basically assuming every platform is as restricted as Android, which would be a notable quality of life and performance reduction in the common case where this isn't true.

Moreover, if winit is not handling these kinds of messages safely itself, then it's not a safe interface anyway and there is no point in using it--the whole reason to use winit is as a cross-platform, safe interface to windowing. A cross platform safe interface that supports four platforms is better than an unsafe interface that requires specific platform knowledge to use safely, but "supports" five platforms. At least, this is true for the vast majority of Rust users. I promise you that if you just mark the current interface unsafe (or propagate that bubble up) people will have no idea how to use it safely and will just add the line to their programs without really thinking about it, which is definitely a much worse state of affairs.

Again I think we need to think about how we can design the window and window handle API on Android in a way that's at least type safe (even if that means needing !Send). I'm pretty sure we can come up with a design for it, and I doubt that whatever we do will be notably different from what clients that wanted to support Android would have to do anyway. But we do have to at least acknowledge that it's needed.

@pythonesque
Copy link

A key issue here, I think, is that it's not safe (with any interface) to ask for a handle to an Android window, even by reference, unless you know for a fact that the window is managed by the Android event loop (and hence is properly receiving updates to window handles). That suggests to me that something like winit can only provide a safe handle to an Android window once the event loop is set up, to ensure that it actually processes messages as it receives them (if the synchronous messages are handled by interrupts, things get considerably more complicated, and functions that process existing handles would probably need to be surrounded by code that disables interrupts). This is all true regardless of whether you have indirection or not, so at that point, why not just make it official and codify the indirection as part of Window, possibly along with a runtime flag that panics if you try to use a window handle before the event loop started?

@Lokathor
Copy link
Contributor

Now, hold on there, that's a lot to take in.

Even if winit isn't perfectly safe, it still has great value in simply being cross-platform and pure rust. It's not as if there's other crates that can replace it even if you throw out the safe aspect of it. I would love if it's also safe, but at the end of the day we're not picking winit or alternate_lib, we're picking winit or nothing.

I'm not saying there needs to be any particular Android or other design, or even an instant decision. I'm just saying that if we can't come up with something after thinking on it for a while and we have to pick between "weaken the lib contract" and "throw android under the bus" then I'm gonna lean toward the former. My goal here is that I really don't want the default situation to be that Android gets left in the dust with Rust programs.

I can also say that the original thought behind this lib was that we just need it to make winit and foo_graphics_lib talk to each other in a very minimal way, just to free ourselves from the coordination problems when either lib does a semver break. Then the actual coordination of the two libs together would still fall upon... whichever code is using both libs. Basically, the same as Glutin combines GL with Winit, other crates would still be bundling up the vulkan/metal/wgpu/whatever surface (or similar) with a window, and the combining code would ensure that things have the correct life and so on. Here, "crates" includes "the code for the bin itself can do it if the writer of that is willing to eat a little unsafe". I'm not saying it has to be that way, that was just the original intent, so that's where the origin of this crate is coming from.

Now, I don't work on winit itself, I don't even use winit actually, but looking at their 0.25 docs it looks like you do need to have an event loop created before you can create a window. So it seems like part of what you want is already the case. The rest where the handle is held inside a cell or something, or somehow the window can signal to users of the handle that there's a new handle... that part doesn't seem to exist yet.

@pythonesque
Copy link

pythonesque commented Jun 18, 2021

Now, I don't work on winit itself, I don't even use winit actually, but looking at their 0.25 docs it looks like you do need to have an event loop created before you can create a window. So it seems like part of what you want is already the case. The rest where the handle is held inside a cell or something, or somehow the window can signal to users of the handle that there's a new handle... that part doesn't seem to exist yet.

The event loop is created there, but it's not yet being run on the current thread (we know this because we're running code on the same thread it's supposed to be running on), which means if any messages are being sent to it (I don't know if they can be) they can't possibly be being processed by user code yet (unless it involves registered interrupts, in which case even the weakened implementation you mentioned can't be safe without substantial changes). I'm not sure exactly when the resource destruction completes, but I think that would need to be investigated before we could assume this was safe. This needs to be resolved by walking through winit and talking to someone more familiar with Android.

I'm not saying there needs to be any particular Android or other design, or even an instant decision. I'm just saying that if we can't come up with something after thinking on it for a while and we have to pick between "weaken the lib contract" and "throw android under the bus" then I'm gonna lean toward the former. My goal here is that I really don't want the default situation to be that Android gets left in the dust with Rust programs.

The crux of what I'm saying is that, basically, that if winit (and by extension as-raw-window-handle, since we all know winit is the most used implementor of AsRawWindowHandle by far) doesn't expose an interface that's safe on Android, Android will get thrown under the bus for cross-platform development whether that's intended or not. Nobody using a library whose entire purpose is to simplify cross-platform development is going to go out of their way to special case Android, so there will just be silent unsafety (if this is indeed unsafe--which I'm still not totally clear on, from reading about Java equivalents of this situation!).

Therefore, this library should provide a strong enough contract that libraries that need to hold onto window handles can safely use them, somehow. What I'm arguing, basically, is that the present (and proposed) contract of AsRawWindowHandle is too weak to be able to actually abstract over the windowing implementation, which means people can't usefully rely on it in a window-agnostic way without giving up Android support. Once we figure out the right way to handle Android windows safely, it can and should be rolled back into winit itself, which would again allow use of the stronger contract.

@pythonesque
Copy link

pythonesque commented Jun 18, 2021

Looking at the current implementation, it's pretty clear why it's unsafe... we are copying a pointer from out of a reader-writer lock. https://github.com/rust-windowing/winit/blob/master/src/platform_impl/android/mod.rs#L592 . Note that native_window() returns a RwLockReadGuard.

https://developer.android.com/ndk/reference/group/a-native-window#group___a_native_window_1ga533876b57909243b238927344a6592db shows that there should be ways to refcount the native handle to ensure that it doesn't go away unexpectedly, but since we're not calling this this is manifestly unsafe. So it seems to me now that just reworking our libraries slightly can turn failing to handle these window changes on Android from a safety issue to a mere usability one (though I think winit should still probably try to handle these events as promptly as possible to avoid unexpected writes to dead windows). This would allow the old as-raw-window-handle contract to be reinstated without changes.

@Lokathor
Copy link
Contributor

we are copying a pointer from out of a reader-writer lock.

Incidentally I also think they shouldn't be doing a panic when the handle isn't available, but that's a separate issue.

So it seems to me now that just reworking our libraries slightly can turn failing to handle these window changes on Android from a safety issue to a mere usability one

Quite reasonable.

So this means... that we're blocked on winit fixing up their android code?

@pythonesque
Copy link

pythonesque commented Jun 18, 2021

we are copying a pointer from out of a reader-writer lock.

Incidentally I also think they shouldn't be doing a panic when the handle isn't available, but that's a separate issue.

So it seems to me now that just reworking our libraries slightly can turn failing to handle these window changes on Android from a safety issue to a mere usability one

Quite reasonable.

So this means... that we're blocked on winit fixing up their android code?

Basically, yeah. But someone who works on android-sdk should chime in with more, since it has a rather complex implementation with pipes and also has a single static RwLock rather than something thread-local, even though (AFAIK?) the messages all come to a single thread.

@Lokathor
Copy link
Contributor

Lokathor commented Dec 5, 2021

I think this is all resolved in 0.4

@maroider
Copy link
Member Author

maroider commented Dec 5, 2021

I mean, we still haven't solved the Android issue, have we?

And I don't recall a specific resolution on !Send vs Send.

Might want to track those as separate issues, though, as this one has grown a bit long.

@Lokathor
Copy link
Contributor

Lokathor commented Dec 5, 2021

Yes please open one or more new issues, linking here and summarizing as necessary.

Particularly, most of what was proposed is the start of this issue has been done, so additional work seems to fit a new issue better.

@maroider
Copy link
Member Author

maroider commented Dec 5, 2021

Ok, tracking these in #84 and #85.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

4 participants