Draft RFC to add a `must_not_await` lint #16

LucioFranco · 2020-07-08T18:34:25Z

Here is the initial work on a RFC to add support for a yield safe lint.

@bIgBV and I have been working on this draft and would love feedback :)

Reference #14

Rendered

RFC on rust-lang/rfcs: rust-lang/rfcs#3014

rfc-drafts/yield_safe_lint.md

taiki-e · 2020-07-08T18:48:05Z

related discussion: rust-lang/rfcs#2890

A comparison with marker trait based approach may be useful.

rfc-drafts/yield_safe_lint.md

yoshuawuyts · 2020-07-08T20:51:57Z

rfc-drafts/yield_safe_lint.md

+
+# Summary
+
+Introduce a `#[yield_unsafe]` lint in the compiler that will warn the user when they unsafely hold an unsafe yield struct across a yield boundary. 


[...] the compiler that will warn the user when they unsafely hold an unsafe yield struct across a yield boundary

It seems the term "unsafely" doesn't seem to refer to memory safety here? The terms "safe" and "unsafe" have very specific meanings in Rust. I would strongly caution against repurposing the terminology of safe and unsafe for anything unrelated to its current use.

Teaching people about unsafe Rust is pretty hard already, and it's often misunderstood. It would be bad for the language if the meaning of "unsafe" within Rust became dependent on context.

Instead I would suggest using terminology such as:

#[no_yield] / "can be yielded" / "cannot be yielded"

#[disallow_yield] / "allowed to be yielded" / "not allowed to be yielded"

#[deny_yield] / "allowed to be yielded" / "not allowed to be yielded"

(actually, this seems mentioned in "Unresolved questions" section)

There is a case where "safe" is used as a safe API in places where it doesn't have to do with memory safety (std::panic::UnwindSafe), but it seems "unsafe" doesn't.

#[deny_yield] / "allowed to be yielded" / "not allowed to be yielded"

It seems odd that the lint name warned by default is "deny_*".

There is a case where "safe" is used as a safe API in places where it doesn't have to do with memory safety (std::panic::UnwindSafe), but it seems "unsafe" doesn't.

I don't believe that's quite true. From the explainer in the docs of what "unwind safety" is:

It is possible, however, for logical invariants to be broken in Rust, which can end up causing behavioral bugs. Another key aspect of unwind safety in Rust is that, in the absence of unsafe code, a panic cannot lead to memory unsafety.

Which means that if a type is !UnwindSafe, it's not memory safe to unwind and the code won't compile. This seems quite different than what the #[yield_unsafe] lint introduces?

What I wanted to say is that the term "safe" is used in APIs that you can never violate memory safety without other unsafe code, even if implemented incorrectly.

https://github.com/rust-lang/rfcs/blob/master/text/1236-stabilize-catch-panic.md#background-what-is-exception-safety-in-rust

Despite these methods to subvert the mitigations placed by default in Rust, a key part of exception safety in Rust is that safe code can never lead to memory unsafety, regardless of whether it panics or not. Memory unsafety triggered as part of a panic can always be traced back to an unsafe block.

Another point on naming: "yield" isn't a documented term in Rust today. While I would very much like to use terminology that's future-proof for when generators stabilize, it seems like it would be quite confusing in the meantime.

Maybe we could use "async" terminology for now, and simply add "yield" terminology as a synonym in the future. The RFC should discuss future options and potential drawbacks, if any.

Yeah, I am not sold on unsafe it was just the best way I could think of explaining it to people that were familiar with async concepts. I would love to hear more proposals I am very open to naming.

@tmandry yeah, that is a really good point. We should probably use await then? But that also limits us when we do get generators out (which idk when that will happen) so not a huge deal we can duplicate the macro.

#[deny_hold_await]

#[disallow_across_await] this seems promising and similar to what @yoshuawuyts suggested.

taiki-e · 2020-07-08T22:49:08Z

Rendered
(You can also see it from "Display the rich diff" button in "Files changed")

yoshuawuyts · 2020-07-09T08:54:49Z

rfc-drafts/yield_safe_lint.md

+    - This is probab
+
+
+# Rationale and alternatives


I'm curious why it was chosen to use an attribute instead of a trait, following std::panic::UnwindSafe.

I think @nikomatsakis wanted us to also take a look at what is going on with MustUse. I think we still want this to be some sort of attribute similar to must_use but we could also support a trait style too.

I would want to parallel #[must_use], I think? But I've shifted my opinion and am now open to using a trait for both I think.

tmandry · 2020-07-10T01:23:08Z

rfc-drafts/yield_safe_lint.md

+# Rationale and alternatives
+
+
+# Prior art


A clippy lint was added: rust-lang/rust-clippy#5439

LucioFranco · 2020-07-10T14:54:25Z

Thanks for the feedback everyone <3. Will update the draft sometime next week and apply all the suggestions. Thanks!

LucioFranco · 2020-07-14T18:18:52Z

Do people have any opinions on calling this #[deny_await]? It feels short enough but conveys the correct message.

tmandry · 2020-07-16T22:54:07Z

Do people have any opinions on calling this #[deny_await]? It feels short enough but conveys the correct message.

deny is already an attribute that takes a parameter, which would make it confusing to explain when talking aloud. I think someone pointed out above that you might be left with things like #[allow(deny_await)], which isn't the end of the world but can contribute to the confusion.

Another point of confusion is that it sounds like the thing you're trying to avoid is awaiting the object itself, instead of holding the object across an await point.

Still, I haven't seen a name that I really like yet, and this is short enough and gets the point across pretty well.

Other ideas:

#[must_not_await] (to go along with #[must_use])
#[lint_on_await]
#[never_suspend]

LucioFranco · 2020-07-17T01:46:53Z

I think #[must_not_await] works extremely well!

betamos · 2020-07-20T23:53:54Z

I have been thinking of a related lint for blocking methods. With blocking, I mean both when yielding the current thread (primarily in sync IO & timers) and compute-heavy functions (e.g. generating prime numbers). Accidentally calling a blocking function in async can cause the executor to hickup for as long as the call takes, degrading the performance globally.

Blocking methods are not annotated, and rarely documented. Some multithreaded executors can work around this by work-stealing, but that cannot generally be assumed. There are no workarounds for singlethreaded executors. Hence, avoiding blocking code today requires a vigilant user in the loop. I believe that an annotation on blocking functions would enable Clippy to provide an important and actionable lint.

The relationship to this RFC is that sync mutices have blocking API methods. The lock() method is blocking (although the try_lock() method is not). As such, calling lock() in an async scope is, imo, a red/yellow flag by itself. The fact that lock() happens to be non-blocking if you never hold the guard across an await is AFAIK entirely specific to mutex and its siblings. A general annotation+lint for blocking APIs could alert users to switch to async APIs. This principle would apply to all synchronous IO, synchronization primitives, timers and compute-heavy operations.

I still think the proposed lint is useful, due to the deadlock risk, but there are legitimate use-cases where a mutex must be held across an await point. If we could only afford one annotation, I'd pick one that is more general. Apologies for derailing the conversation, but these seem related. Happy to open a new issue.

LucioFranco · 2020-07-21T17:09:54Z

@betamos could you expand on this comment?

but there are legitimate use-cases where a mutex must be held across an await point.

I have yet to see a use-case for this, beyond "you should just use an async awaire one".

LucioFranco · 2020-07-21T18:12:31Z

Even async-aware mutexes can deadlock if you hold their guard across an await, no?

It can deadlock in the normal manners that a std::sync::Mutex can deadlock. Rust doesn't provide any protection against that anyways so for me that is out of scope. The key here is that to acquire a new lock will poll and yield anyways if the async lock guard is held across a yield boundary. This is the problem that a sync mutex will deadlock in weird ways because of the implementation details of async/futures/runtimes.

locking a non-async-aware mutex anywhere in an async context

I'm not convinced this is a red flag. Short critical sections and locking on that in a low contention environment is totally fine. Its something that just needs to be measured. I don't think we've totally explored all the possibilities for it either. But for example, in a very heavy read vs write work load a regular RwLock will do just fine.

The goal of this lint in general is to merely just inform the user that something weird may happen because the compiler transformed your code in a way you may not have expected. It's not here to protect a user from every deadlock.

betamos · 2020-07-21T18:39:11Z

It can deadlock in the normal manners that a std::sync::Mutex can deadlock. Rust doesn't provide any protection against that anyways so for me that is out of scope.

+1. If a mutex is never held across an await, it will not deadlock. This is true for either type of mutex.

If you want to model something like transactionality, you may need to hold it across an await-point. This can be sound and used correctly, but puts more responsibility on the programmer. I like the current RFC since it shoos most users away from it when introduced by accident.

I'm not convinced this is a red flag. Short critical sections and locking on that in a low contention environment is totally fine.

std::Mutex::lock sometimes translates to yielding the current thread. Using an async lock from the get-go is the preferable option IMO, so if holding-across-await is needed later, it doesn't prompt a migration to an async lock. (As an example, a sync method that uses a sync mutex must be changed to async upon migration, which is a breaking change).

But for example, in a very heavy read vs write work load a regular RwLock will do just fine.

Why not use an async-aware RWLock? I can see the lack of library support but is there anything else?

To sum up: I'm not suggesting you must never call potentially blocking code in async. Of course you can override if you know what you're doing ™. It seems easier to teach "never call blocking methods" than "this particular method is blocking under certain circumstances, so avoid those circumstances".

LucioFranco · 2020-07-21T18:50:28Z

std::Mutex::lock sometimes translates to yielding the current thread. Using an async lock from the get-go is the preferable option IMO,

Ah yeah, so this is where I think we are disconnected. I generally suggest if your critical section is small enough and doesn't need to last across a yield then always use a sync mutex. This is because the cost of creating the waker is a bit more expensive. (There may be opportunity here to investigate and make it faster). We have seen in benchmarks that the async versions will be slower. I don't think the yielding from the sync mutex creates that much of a problem though. I thought the yield is mostly a hint to the scheduler. That said, I am not 100% sure on that impact but is something worth measuring.

Why not use an async-aware RWLock? I can see the lack of library support but is there anything else?

Same issue as I said above it costs much more than a regular one.

To sum up: I'm not suggesting you must never call potentially blocking code in async. Of course you can override if you know what you're doing ™. It seems easier to teach "never call blocking methods" than "this particular method is blocking under certain circumstances, so avoid those circumstances".

Agreed, to me its mostly a case of poor docs. But we just published a new tokio site that has some docs on shared state. https://tokio.rs/tokio/tutorial/shared-state Hopefully this is a bit more explicit and easier for the user to understand the why. I would love to write some sort of blog on this with some actual numbers though instead of (my own) hand waving 😄

tmandry · 2020-07-21T19:07:58Z

I don't think the yielding from the sync mutex creates that much of a problem though. I thought the yield is mostly a hint to the scheduler.

I think @betamos is referring to the contended case where your thread sleeps until the mutex is unlocked again. In that case the cost is high (and all progress on async tasks stops on your thread, of course).

This is because the cost of creating the waker is a bit more expensive. (There may be opportunity here to investigate and make it faster).

I agree there probably is; you should only need to create the waker in a contended case, in which case it should be less expensive than a futex.

LucioFranco · 2020-07-21T19:11:45Z

I think @betamos is referring to the contended case where your thread sleeps until the mutex is unlocked again. In that case the cost is high (and all progress on async tasks stops on your thread, of course).

Right, if you have high contention that is true but also work-stealing can help in this case. There is some balance here that needs to be figured out but there are a lot of variables. So not sure exactly how we can communicate that.

I agree there probably is; you should only need to create the waker in a contended case, in which case it should be less expensive than a futex.

Would be good to benchmark, but agreed we could probably do better as well.

bIgBV · 2020-07-21T20:23:53Z

I don't think the yielding from the sync mutex creates that much of a problem though. I thought the yield is mostly a hint to the scheduler.

I think @betamos is referring to the contended case where your thread sleeps until the mutex is unlocked again. In that case the cost is high (and all progress on async tasks stops on your thread, of course).

Imo the problem of when it is safe to use a blocking lock vs a non-blocking lock is too complex to be solved by a single RFC or feature. I think we should focus on a smaller subset of the problem. This RFC initially came to be because of the unusual/unexpected failure modes that one might encounter when holding a blocking lock across await points. This is mostly due to the non-obvious way in which the rust compiler transforms async await code, which most newcomers to the async await world of rust would most likely not have a good understanding of. And they shouldn't need to!

The goal is to highlight the problem and steer the user towards a solution which they are most likely looking for. In this sense, I think a attribute which lints such types would help.

As far as the bigger problem on blocking code inside async code goes, I think that's a more general problem which an attribute lint can't necessarily solve. I think we need to tackle that separately.

betamos · 2020-07-21T21:16:46Z

We have seen in benchmarks that the async versions will be slower.

Thanks for providing that context. As I side note, I'd be very interested in peeking at those.

This is because the cost of creating the waker is a bit more expensive.

Nod, thanks for confirming my suspicions. The waker overhead seems like an unfortunate fact-of-life for multi-threaded executors, and is concerning on a much broader scale. Anyway, I'm rambling off topic.

There may be opportunity here to investigate and make it faster

Yes, it seems like in the uncontended case there would be no reason to register/clone a waker? I don't know of anything inherit about an async mutex that demands a perf hit.

As far as the bigger problem on blocking code inside async code goes, I think that's a more general problem which an attribute lint can't necessarily solve.

What's the complexity? I'd classify blocking as:

The function may yield the thread or
The function is long-running, for some epsilon (subject to bike-shedding).

I think we need to tackle that separately.

That would be fine. Don't let this block. I just wanted to shed some light on the problem space.

tmandry · 2020-09-16T00:43:27Z

rfc-drafts/must_not_await_lint.md

+
+# Motivation
+
+Enable users to fearlessly write concurrent async code without the need to understand the internals of runtimes and how their code will be affected. The goal is to provide a best effort warning that will let the user know of a possible side effect that is not visible by reading the code right away. Some examples of these side effects are holding a `MutexGuard` across an await bound in a single threaded runtime. In this case the resulting generated future will resolve to `!Send` but could still hold the lock when the future yields back to the executor. This opens up for the possibility of causing a deadlock since the future holding onto the lock did not relinquish it back before it yielded control. This can become even more problematic for futures that run on single-threaded runtimes (`!Send`) where holding a local after a yield will result in a deadlock.


Suggested change

Enable users to fearlessly write concurrent async code without the need to understand the internals of runtimes and how their code will be affected. The goal is to provide a best effort warning that will let the user know of a possible side effect that is not visible by reading the code right away. Some examples of these side effects are holding a `MutexGuard` across an await bound in a single threaded runtime. In this case the resulting generated future will resolve to `!Send` but could still hold the lock when the future yields back to the executor. This opens up for the possibility of causing a deadlock since the future holding onto the lock did not relinquish it back before it yielded control. This can become even more problematic for futures that run on single-threaded runtimes (`!Send`) where holding a local after a yield will result in a deadlock.

Enable users to fearlessly write concurrent async code without the need to understand the internals of runtimes and how their code will be affected. The goal is to provide a best effort warning that will let the user know of a possible side effect that is not visible by reading the code right away.

One example of these side effects is holding a `MutexGuard` across an await bound. This opens up the possibility of causing a deadlock since the future holding onto the lock did not relinquish it back before it yielded control. This is a problem for futures that run on single-threaded runtimes (`!Send`) where holding a local after a yield will result in a deadlock. Even on multi-threaded runtimes, it would be nice to provide a custom error message that explains why the user doesn't want to do this instead of only a generic message about their future not being `Send`.

You might want to hint in this section that other kinds of RAII guards may want this (which you talk about below). Someone who only reads to here might think this doesn't have enough use cases for a general mechanism.

tmandry · 2020-09-16T00:45:01Z

rfc-drafts/must_not_await_lint.md

+Provide a lint that can be attached to structs to let the compiler know that this struct can not be held across an await boundary.
+
+```rust
+    #[must_not_await]


Like must_use we should allow a custom error message to be provided.

tmandry · 2020-09-16T00:45:59Z

rfc-drafts/must_not_await_lint.md

+
+```rust
+    #[must_not_await]
+    struct MyStruct {}


nit: can you un-indent these code blocks?

tmandry · 2020-09-16T00:49:22Z

rfc-drafts/must_not_await_lint.md

+    struct MyStruct {}
+```
+
+This struct if held across an await boundary would cause a warning:


Suggested change

This struct if held across an await boundary would cause a warning:

This struct if held across an await boundary would cause a deny-by-default warning:

deny-by-default is a new term for me. Do you have any links to any documentation about this type of behavior?

https://doc.rust-lang.org/rustc/lints/listing/deny-by-default.html

They are basically errors that you can disable with #[allow].

tmandry · 2020-09-16T00:50:09Z

rfc-drafts/must_not_await_lint.md

+- [`std::sync::MutexGuard`](https://doc.rust-lang.org/std/sync/struct.MutexGuard.html)
+- [`tracing::span::Entered`](https://docs.rs/tracing/0.1.15/tracing/span/struct.Entered.html)
+
+This will be a best effort lint to signal to the user about unintended side-effects of using certain types across an await boundary.


Suggested change

This will be a best effort lint to signal to the user about unintended side-effects of using certain types across an await boundary.

This will be a best effort lint to signal the user about unintended side-effects of using certain types across an await boundary.

tmandry · 2020-09-16T00:50:52Z

rfc-drafts/must_not_await_lint.md

+
+# Reference-level explanation
+
+Going throuogh the prior are we see two systems currently which provide simailar/semantically similar behavior:


This seems more like it belongs in either the motivation or the rationale and alternatives section, to me at least.

rfc-drafts/must_not_await_lint.md

tmandry · 2020-09-16T00:53:53Z

rfc-drafts/must_not_await_lint.md

+`#[must_use]` is implemented as an attribute, and from prior art and [other literature][linear-types], we can gather that the decision was made due to the complexity of implementing true linear types in Rust. [`std::panic::UnwindSafe`][UnwindSafe] on the other hand is implemented as a marker trait with structural composition.
+
+
+## High level design overview


Maybe there's precedent for this that I don't know about, but I don't feel that implementation notes like this belong in an RFC.

tmandry · 2020-09-16T00:54:50Z

rfc-drafts/must_not_await_lint.md

+ - Reference link on how mir transfroms async fn https://tmandry.gitlab.io/blog/posts/optimizing-await-2/
+
+# Drawbacks
+- There is a possibility it can produce a false positive warning and it could get noisy. We likely want to allow overriding via some sort of module level `allow` attribute.


Yes we do (and it should work at the item level as well), leveraging the same mechanism as other deny-by-default lints.

tmandry · 2020-09-16T00:55:51Z

rfc-drafts/must_not_await_lint.md

+
+# Future possibilities
+
+- Propagate the lint in nested structs/enums. Similar to the use case for the `must_use` attribute. These likely should be solved together.


I think we should answer this question before stabilizing. If we add this afterwards it'll create unnecessary breakage.

I think we should include this in this change. In my opinion, this would be the best experience for the user of this attribute.

If we do that it'll be inconsistent with #[must_use]. If that's the end goal we want we should explain why they should be different, or discuss the possibility of changing #[must_use] in the future.

Strawman: This is about what types are safe to include (transitively or not) in a Future object. #[must_use] is more about how individual values are handled in the code. The state of things today is that new types already have to think about #[must_use], but rarely do they think about yielding and async/await, and we shouldn't expect them to.

This was brought up before by @nikomatsakis in an earlier comment. There was a PR which introduced the behavior we are speaking of for #[must_use] but was closed because of crater breakages.

That particular conversation was biased towards having the same behavior across both attributes.

It would certainly increase the scope of this project especially so if it's a breaking change we might have to ship it as a part of the 2021 edition (if it's happening).

Maybe for now we can add an "Unresolved questions" section and add this to it?

tmandry

Looking good, I think this is almost ready to merge.

tmandry · 2020-09-22T22:10:45Z

rfc-drafts/must_not_await_lint.md

+
+The compiler might output something along the lines of:
+
+TODO: Write a better error message.


You can show your custom error message here.

The generic default could be as straightforward as "MyStruct should not be held across an await"

tmandry · 2020-09-22T22:11:37Z

rfc-drafts/must_not_await_lint.md

+[trait declaration]: https://doc.rust-lang.org/reference/items/traits.html
+[impl trait]: https://doc.rust-lang.org/reference/types/impl-trait.html
+
+### Auto trait vs attribute


I'm not sure this belongs in the reference section. Is this an unresolved question, or does it belong in the rational section?

This answers the question of why we chose to go the attribute route instead of creating an auto-trait. I think this should be in the rationale section.

tmandry · 2020-09-22T22:12:54Z

rfc-drafts/must_not_await_lint.md

+
+# Future possibilities
+
+- Propagate the lint in nested structs/enums. Similar to the use case for the `must_use` attribute. These likely should be solved together.


Maybe for now we can add an "Unresolved questions" section and add this to it?

rfc-drafts/must_not_await_lint.md

LucioFranco · 2020-10-05T19:16:01Z

Looks like most questions have been resolved, I'd like to give a call for last comments and see if we can get this merged and opened as an actual RFC 😄

tmandry

Left a few comments, otherwise looks good to me.

tmandry · 2020-10-07T18:40:36Z

rfc-drafts/must_not_await_lint.md

+
+- Any RAII guard might possibly create unintended behavior if held across an await boundary.
+
+This lint will enable the compiler to warn the user that the generated MIR could produce unforeseen side effects. Some examples of this are:


nit: The MIR is more of an implementation detail. It's just the defined semantics of async functions, namely the fact that they suspend execution.

tmandry · 2020-10-07T18:56:52Z

rfc-drafts/must_not_await_lint.md

+
+The `must_not_await` attribute may include a message by using the [`MetaNameValueStr`] syntax such as `#[must_not_await = "example message"]`.  The message will be given alongside the warning.
+
+When used on a user-defined composite type, if the [expression] of an [expression statement] has this type and is used across an await point, then this lint is violated.


I don't think expressions or expression statements are necessarily right here – a let binding that's in scope during an await could cause the same behavior. A temporary could also be held across an await point in the return expression of a function, for instance.

The existing reference doesn't give a whole lot of scaffolding to work with. Maybe "if a value exists during an await point" is the best we can do for now. If we want to get more formal than "value".. "value expression, temporary, or variable"?

tmandry · 2020-10-07T18:59:34Z

rfc-drafts/must_not_await_lint.md

+}
+```
+
+When used on a function, if the [expression] of an [expression statement] is a [call expression] to that function, and the expression is held across an await point, this lint is violated.


I also find the expression wording here a bit confusing. I think we can do something similar to the above here, and reference "the value returned by the function."

Now that I think about it, it isn't obvious to me that we want to allow this attribute on functions. Are there any use cases where we'd want it on the function but not the type? I realize this is consistent with #[must_use], though.

Yeah, there's two aspects to this

1.) Being consistent with #[must_use]
2.) Being able to mark the result of any expression as #[must_not_await] and I think this was the reason why the #[must_use] attribute works this way in the first place.

tmandry · 2020-10-07T19:00:52Z

rfc-drafts/must_not_await_lint.md

+}
+```
+
+When used on a [trait declaration], a [call expression] of an [expression statement] to a function that returns an [impl trait] of that trait and if the value is held across an await point, the lint is violated.


tmandry · 2020-10-07T19:03:42Z

rfc-drafts/must_not_await_lint.md

+
+Going through the prior are we see two systems currently which provide simailar/semantically similar behavior:
+
+## Clippy `#[await_holding_lock]` lint


Suggested change

## Clippy `#[await_holding_lock]` lint

## Clippy `await_holding_lock` lint

That's the name of the lint but I don't think it's an attribute

bIgBV · 2020-10-09T20:50:31Z

@tmandry I've addressed your comments and as discussed on zuplip, I'm merging this PR. Thanks again for all your help and @nikomatsakis as well.

PS: I'll be opening an RFC on the RFCs repo over the weekend.

jonhoo · 2020-11-09T01:43:42Z

@bIgBV Did you open an RFC PR for this? I couldn't find it with a quick search. Would probably be useful to link to it here (maybe in the first post @LucioFranco?)

LucioFranco · 2020-11-09T14:49:23Z

@jonhoo I think somethings have come up for both @bIgBV and I, but we should be opening this soon. Sorry for the delay, life has kinda been crazy the last few weeks :)

jimblandy · 2021-01-21T18:33:06Z

Probably RefMut should be included in this class. Holding one across an await will cause the next attempt to acquire any sort of reference to the RefCell to panic.

Draft RFC to add a yield safe lint

6d8aaf1

LucioFranco self-assigned this Jul 8, 2020

taiki-e reviewed Jul 8, 2020

View reviewed changes

rfc-drafts/yield_safe_lint.md Outdated Show resolved Hide resolved

programmerjake reviewed Jul 8, 2020

View reviewed changes