-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for oom=panic (RFC 2116) #43596
Comments
Note that in previous topics I stated that libunwind doesn't handle OOM well, but this appears to be outdated or incorrect information. They preallocate their memory before an unwind is even requested. In the limit they might run out of memory and give up but it's not like we've lost anything for trying. |
I figured I'd comment on this with respect to #42808 and the outcome there along with the @rust-lang/libs team discussion. It was discovered in #42808 that the integration of the Note, though, that this is not necessarily directly related to OOM but just allocator implementations themselves. Should we forbid, in general, panics from allocators? Should we only allow the oom method to panic? Unsure! |
I personally think it is reasonable to forbid the allocator itself from panicking and instead have the |
@retep998 (I personally wanted to try to stuff the |
@pnkfelix Yep, that would do exactly what I want. It's my fault for not looking at the existing API and seeing that we already had that 😛 |
RFC rust-lang/rfcs#2116 was accepted, I’ve repurposed this feature request into a tracking issue. See more in this thread’s original message. |
We now have |
Minor question: if someone sets |
I expect it would be mostly identical, though the error message that’s printed might not be exactly the same. I don’t know if it’s worth forbidding explicitly even if it’s not particularly useful. An observable effect (though again not particularly useful since |
Making allocation errors default to panic has undergone FCP successfully: #66741 I wonder if there's a point to this cargo-level feature anymore? |
#66741 is only relevant in programs where |
I had sent my ideas to rust-lang/wg-allocators#62 ; but as it looks like there already is a tracking issue, let's move the comments here. Assume a rust server running multiple things on behalf of clients. It uses regular Rust code, that is not written to care specifically about this particular use case. The server wants to limit the memory consumption clients could force upon it. It does so by having each client's allocations be in an arena allocator, and then wants to abort just that client's connection in case of “OOM.” Unfortunately, this means that OOM must not abort the whole process, and AFAIU that is what the current behavior is, even with set_alloc_error_hook -- or at least so do the docs read like. I'm not sure Does that make sense? |
It's probably the case that (a) the amount of memory needed to alloc panic data is fairly low and (b) you'd hit oom from asking for some large amount that can't be offered. In this situation, you'd probably be able to allocate the panic data. However, if you fail to allocate while panicing you'll just get a panic, and a panic during panic is an abort. So at least you're not trapped in a loop ;P But also: you can't carelessly change the global allocator while allocated things are present in the world or they'll get potentially freed on the wrong allocator, which is obviously bad. Basically the entire scenario you want can't be done with rust's current conventions regarding memory allocation. You'd need specialized local-allocator using code, or waiting for alloc-wg to make the entire ecosystem more alloc aware over time. |
Just for clarification: I was assuming the existence of a global allocator that is able to use different memory pools for allocation depending on some thread-local. Entering a task's context would set the thread-local to the task's memory pool, and either exiting it or hitting an OOM during it would reset it to the global memory pool. It knows which memory pool to free from without the memory pool being necessarily the same as the one that was configured during allocation, thanks to knowing the address of the memory block to free. By doing this, I think that what I want to do can be achieved with the current conventions regarding memory allocations, but iff |
Oh, and I forgot to answer your initial points: unfortunately, a client able to make the server abort is a DoS vulnerability, so… that's not a good thing :( And I can totally see the client being able to fill the pool by making the server allocate one byte at a time -- the server could try to prevent that by keeping its own counter of how much memory each task has used, but it requires potentially large changes in every library used by the server, while the solution based on allocators would work without requiring cooperation from every piece of the code (including those I don't control) :) |
Implement -Z oom=panic This PR removes the `#[rustc_allocator_nounwind]` attribute on `alloc_error_handler` which allows it to unwind with a panic instead of always aborting. This is then used to implement `-Z oom=panic` as per RFC 2116 (tracking issue rust-lang#43596). Perf and binary size tests show negligible impact.
Implement -Z oom=panic This PR removes the `#[rustc_allocator_nounwind]` attribute on `alloc_error_handler` which allows it to unwind with a panic instead of always aborting. This is then used to implement `-Z oom=panic` as per RFC 2116 (tracking issue rust-lang#43596). Perf and binary size tests show negligible impact.
Implement -Z oom=panic This PR removes the `#[rustc_allocator_nounwind]` attribute on `alloc_error_handler` which allows it to unwind with a panic instead of always aborting. This is then used to implement `-Z oom=panic` as per RFC 2116 (tracking issue rust-lang#43596). Perf and binary size tests show negligible impact.
I think this just needs to be stabilized. We might need to audit the standard library to check that we stay sound in the face of unwinding from allocations. Also I'm not sure what the state of the documentation is. |
What are the chances of stabilizing this soon? |
I don't think std is ready for oom=panic yet. The unwinding path is not allocation-free, so likely currently oom=panic only works when a large allocation fail (but the allocator can still satisfy a few small ones). |
that still seems like a strict improvement over the current situation. |
@nbdd0121 As long as this behavior is safe (I assume it causes abort like a panic during a panic), I don't think this issue is a blocker to stabilization, because this deficiency doesn't change the public interface of |
Currently we panic with error message, and that will require allocation, and I consider that a publicly visible behaviour. |
If the error message is fixed, we should be able to pre-allocate the necessary object (though it might require some hacky code in the panic internals). |
Even for fixed error message we need to allocate a C++ doesn't have this issue because it uses special We could make the panic payload a ZST, and that'll avoid allocation problem on unwind path entirely. All other allocations that we currently have can be avoided, I think. |
We’re constrained to |
There are 2 levels of allocation when panicking, first a |
The exception object doesn't need to be allocated on the heap. It can be |
@nbdd0121 What if the exception object is subsequently passed across a thread boundary, and the thread later terminates while the exception object is still present on some other thread stack or heap? |
Libunwind exception objects can't be sent to another thread. They are freed when the exception is caught an not rethrown. However the process of unwinding could trigger additional memory allocations due to drop impls, which could cause another out-of-memory exception to be thrown. C++ handles this by only having one OOM exception object in TLS, if that one is already taken then the exception is allocated normally (with an abort if that allocation fails). |
wouldn't it make more sense to collapse multiple out-of-memory exceptions into one (or, like, have a simple counter for the occurrences, not that it matters much), and stop executing that drop impl (I'm not sure how that would affect invariants and such, but the unwinding would be there either way, so I don't think it would change much), are there downsides to that besides even more unpredictable control flow in OOM cases? |
This breaks the pinning invariant for things on the stack that are pinned and as such is unsound. |
Update: This now the tracking issue for a mechanism to make OOM / memory allocation errors panic (and so by default unwind the thread) instead of aborting the process. This feature was accepted in RFC rust-lang/rfcs#2116.
It was separated from #48043, which tracks another feature of the same RFC, since the two features will likely be stabilized at different times.
Blockers
This should be blocked until we can audit all places in the code that allocate to ensure they safely handle unwinding on OOM.
VecDeque::shrink_to
leads to UB ifhandle_alloc_error
unwinds. #123369Steps:
std::alloc::set_allocation_error_hook
(tracked at Tracking issue for the OOM hook (alloc_error_hook
) #51245) could potentially be this mechanism, but that hook is currently document as not allowed to unwind.Original feature request:
Several users who are invested in the "thread is a task" model that Rust was originally designed around have expressed a desire to unwind in the case of OOM. Specifically each task has large and unpredictable memory usage, and tearing down the first task that encounters OOM is considered a reasonable strategy.
Aborting on OOM is considered unacceptable because it would be expensive to discard all the work that the other tasks have managed to do.
We already (accidentally?) specified that all allocators can panic on OOM, including the global one, with the introduction of generic allocators on collections. So in theory no one "should" be relying on the default
oom=abort
semantics. This knob would only affect the default global allocator. Presumably there would just be a function somewhere in the stdlib which is cfg'd to be a panic or abort based on this flag?This is not a replacement for proper fallible allocation handling routines requested in #29802 because most of the potential users of that API build with
panic=abort
. Also the requesters of this API are unwilling to go through the effort to ensure all their dependents are using fallible allocations everywhere.I am dubious on this API but I'm filling an issue so that it's written down and everyone who wants it has a place to go and assert that it is in fact good and desirable.
The text was updated successfully, but these errors were encountered: