-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reviving tail-call elimination #2691
Comments
This is besides the main point, but is there some reason this is not a suitable alternative to recursive typedefs? struct Instruction {
fun: fn(&mut Stack, &[Instruction])
} |
Yes, that's a great solution that I hadn't thought of EDIT: Updated original post to use this method |
It's worth noting that LLVM doesn't even emit tail calls for noreturn functions - it emits EDIT: Now that I think about it, this and the original example are probably both explicitly prevented by Rust for indirect function calls in order to preserve panic safety. Maybe Rust should only prevent TCE in the case that the caller has drop code to run. |
The reason LLVM can't emit tail calls for noreturn functions is that enabling TrapUnreachable means that they're no longer tail calls at all rust-lang/rust#45920 |
Would recommend explaining that TCE means tail call elimination in your post for those of us who had to read the thread to figure that out. |
Good point, I've fixed the title |
I am just here to add a small voice in support of TCE. Seems like this topic has not had any love in over a year :( |
Is there a reason we shouldn't require caller and callee signatures to match? That would allow the use of musttail. |
Mutual recursion? Requiring matching signatures is sort of acceptable as a stop-gap restriction assuming that the elimination requires the explicit use of a dedicated keyword. However, if it would be implicit, you'd get the nasty surprise of mutually recursive functions sometimes working and sometimes not, with no indication from the compiler. That would be pretty bad. |
Now that those issues appear to be resolved, could it be time to start talking about TCE again? |
Clang just merged support for guaranteed tail-calls: |
Can someone prepare a summary comment for the previous discussion? The most recent complete summary I think is #1888 (comment). There’s also this incomplete summary: #1888 (comment) |
This article about Clang's guaranteed tail-calls has been making the rounds on the internet https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html, it makes the case for why one would want such a feature in a programming language. |
The title is very unfortunate, as it only shows tail call optimization(TCO) for the complicate case of Note, that functional languages are often cheating here and put what they see as "the call stack" on the heap (which is of course significant slower due to the heap lookup (and this can even include additional checks). The basic rule of TCO is
So my advice would be to provide simple tail calls and forbid it for anything recursive, since even in the case of case 3 being less complex and expressible/derivable with lifetimes from the trait system LLVM has rather primitive to non-existing shape analysis in contrast to Rust. Sidenode: Terminology is poor, since TCO does not differentiate between recursive and non-recursive calls. |
The stack is also on the heap, there is no difference in access speed other than the stack just being so well used that it stays resident on the CPU, which is the case if you use some other part of memory as the stack as well. The only 'cost' is needing to pop an instruction frame off your stack to the SP if you want to call But yes, I would love full proper TCO, even if via a |
Checking that functions with explicit tail calls are memory-safe would be feasible, though, whether or not they're recursive.
I don't know Zig, but it looks like that issue is not related to the recursion. Passing a pointer to your own stack into a tail call would cause unwanted behavior whether or not the tail call is recursive. In Rust, passing a slice reference (not a reference to a slice reference) to data referenced by an incoming argument should work just fine.
If there's no recursion, stack size is always bounded and there is no need for guaranteed TCO in the first place. Given an explicit tail call, the borrow checker would need to give stack objects |
(Or better yet, if there is/can be a notion of a lifetime ending during a function call, that would fit nicely here) Or the borrow checker could treat |
Yeah, treating the |
Yeah it makes semantic sense to model it as a return followed by a call to the next function, you could consider it semantically like modelling a call to a tail-recursive function as having similar semantics to a loop that uses |
There is now a proposal for C23 on Tail-call elimination. |
In my understanding, this proposal is only about the Rust calling convention, totally internal, and doesn't affect interfaces with other languages. However, that would be great if we could provide TCE with other calling conventions like C and WASM for ABI compatibility with the clang's I'm currently working on a programming language project. And it uses trampoline functions for now to circumvent the lack of TCE in Rust. |
My proposal (#1888) solved the escape analysis problem using the borrow checker. Specifically, it desugared fn x() {
let a = Box::new(());
let b = Box::new(());
become y(a)
} as fn x() {
let a = Box::new(());
let b = Box::new(());
let _tmp = a;
drop(b);
become y(_tmp)
} This avoids all of these problems: the lifetimes of local variables end before the actual tail-call, so any attempt to pass a reference to them is a lifetime error and will be caught by borrowchk. No escape analysis is necessary or even desirable. |
The Wasm |
I appreciate the discussions here. What would be the next step to move this issue forward? |
Given the long history of this feature request my feeling is that the next step is to find a few people: namely, people from the lang and compiler team to act as mentors and support another person in their attempt to actually implement an initial support in |
Sounds like a plan. As I'm new to rust compiler development, do you know a place to find mentors? Is it on Zulip? I actually already know the issue you linked, thank you for the summary! I'm also interested in this feature to improve emulator performance. |
The zulip is absolutely a good place to start. |
The big open question to me is this: Can we make this the backend's problem, or does it need to be rust's problem? Said otherwise, is it a reasonable requirement to say that every possible backend must support this? Historically WASM didn't, so the answer was no. If WASM tail call support is landing, then it might be acceptable, but it's still evidence that there might well be desirable targets that don't support it. Certainly big complicated optimizing backends (like LLVM and GCC) will support it, but it's harder to know for sure for other ones. Suppose someone wrote a backend that generates C89 to use with a proprietary compiler that doesn't support TCO. Would we be ok saying that such a backend "isn't Rust"? (I arbitrarily checked MSIL/CIL, which appears to have I'm not sure how best to resolve it, but it has huge implications on most of the rest of the feature. That said, there's also some relatively more straight-forward work that could be done regardless of the core implementation strategy. For example, one could start implementing Or, I suppose, there's the other alternative of making some new language thing to enable a principled form of computed local goto, rather than doing these situations through TCO. (https://internals.rust-lang.org/t/pre-rfc-safe-goto-with-value/14470/9?u=scottmcm.) It's possible that expressing interpreters/emulators that way might be more natural than TCO anyway, if the goal is to jump between a constrained set of locally-defined things, rather than tailing out to arbitrary other things -- I'm not sure. |
Another possibility might be to make it the backend's problem, but also make sure it is always possible to implement using trampolines as a fallback- that might be acceptable if it only needs to be used on sufficiently-niche backends, such as part of bootstrapping. I can imagine a few different ways of going about that, by limiting the feature in various ways- |
Thank you all for your comments. Let me try to "summarize" the next things that can be worked on (this turned out to be quite long). Please correct me if I'm misunderstanding or missing something. First off, regarding the implementation, I spent some more time to read through the discussions on #1888. It seems to me that there is a consensus that using the become keyword (and the suggested scope transformation) is the sanest way to implement guaranteed tail calls of all discussed options. That said, as mentioned by @scottmcm, the big unknown is how to best implement this feature. One technique to reduce complexity is the trampoline as a fallback (also mentioned here by @rpjohnst). Other than this core issue, there are some questions regarding teaching and using of become, which can initially be investigated separately without the need for a functioning implementation. All in all it seems that a new attempt at a implementation would be the correct next step, even if the result is only investigative. Note that there is a implementation that is now ~6 years old (GitHub - DemiMarie/rust at explicit-tailcalls) and a more recent implementation for the frontend (#1888 (comment)) which might be good enough to guide the implementation, especially with the support of a mentor. As someone not used to the compiler development process, judging from the contribution guidelines we still need a RFC under which we can put / discuss the implementation, if this work should be integrated with Rust. As the pull request #1888 has only been postponed, I think another important step would be to revive that RFC. Note that the RFC was postponed because it did not fit with the goals of 2018 (comment) and while there was further discussion and notably developments in LLVM and WebAssembly since then it has not been reopened. There actually was a comment to ask for the RFC to be reopened which seems to have gone under the radar. Which leads to the question, what is the correct way to ask for a RFC to be reopened? Naturally, the RFC would have to be updated with the discussed points and a consensus on the different issues would need to be found, which hopefully can be done with the support of an implementation. So, coming to the actual list of TODOs:
|
Given Rust's current lack of support for tail calls, we cannot avoid using `async` for builtins. This is the only way to avoid overflowing the cpu stack when we have arbitrarily deep builtin/interpreted/builtin/interpreted/... "sandwiches" There are only five `async fn` functions which are not builtins (some come in multiple "flavors"): - add_values - resolve_with - force, final_deep_force - nix_eq, nix_cmp_eq - coerce_to_string These can be written iteratively rather than recursively (and in fact nix_eq used to be written that way!). I volunteer to rewrite them. If written iteratively they would no longer need to be `async`. There are two motivations for limiting our reliance on `async` to only the situation (builtins) where we have no other choice: 1. Performance. We don't really have any good measurement of the performance hit that the Box<dyn Future>s impose on us. Right now all of our large (nixpkgs-eval) tests are swamped by the cost of other things (e.g. fork()ing `nix-store`) so we can't really measure it. Builtins tend to be expensive operations anyways (regexp-matching, sorting, etc) that are likely to already cost more than the `async` overhead. 2. Preserving the ability to switch to `musttail` calls. Clang/LLVM recently got `musttail` (mandatory-elimination tail calls). Rust has refused to add this mainly because WASM doesn't support, but WASM `tail_call` has been implemented and was recently moved to phase 4 (standardization). It is very likely that Rust will get tail calls sometime in the next year; if it does, we won't need async anymore. In the meantime, I'd like to avoid adding any further reliance on `async` in places where it wouldn't be straightforward to replace it with a tail call. https://reviews.llvm.org/D99517 WebAssembly/proposals#157 https: //github.com/rust-lang/rfcs/issues/2691#issuecomment-1462152908 Change-Id: Id15945d5a92bf52c16d93456e3437f91d93bdc57 Reviewed-on: https://cl.tvl.fyi/c/depot/+/8290 Reviewed-by: tazjin <[email protected]> Tested-by: BuildkiteCI Autosubmit: Adam Joseph <[email protected]>
Just wanted to let you know that I'm still working on this. My plan is to create a (hopefully) comprehensive summary as the next step, though this is taking some time. (Also reading through the rustc dev guide currently). |
The main issue with general tail calls is that they require complex stack shuffling when the calling conventions don't match exactly. Consider this case: fn foo(arg: [u8; 1024]);
fn bar() {
// The calling convention requires this to be passed indirectly by pointer.
// But there is nowhere on the stack for the allocation to live!
become foo([5; 1024]);
} A more viable solution would be to only support restricted tail calls where the callee is required to have the exact same signature as the caller. The problem then becomes much simpler: we just have to overwrite our own arguments with new values and then transfer control to the target function. This is what Clang's However there's still the issue of backend and target support:
With that said, I do think these are solvable problems and I am happy to help as a mentor for this feature. The first step should be to open a new RFC, partially based on #1888, which proposes a new plan for supporting restricted tail calls (with the same-signature restriction). |
This is not enough for e.g. general functional programming, though. LLVM has full support for general tail calls and GHC and HiPE both rely on it. The difference is that those rely on a different calling convention that makes this easy. Even with the standard calling convention, it is still possible to implement fully general tail calls via a trampoline. |
Thank you for your input! I would be happy to prepare a new RFC as I have been reading related threads anyway. As already mentioned by @DemiMarie, regarding the exact same signature, my current understanding is also that this is still not quite enough, for example LLVM requires a specific calling convention (fastcc) which differs from the default calling convention normally used. One option that has been discussed is to change the default calling convention, which would have a wider impact (though it might be feasible #1888 (comment)). The alternative is to mark functions as tail-callable to limit the impact of an alternative calling convention. |
This would add significant overhead to all function calls, unless functions that perform tail calls are specifically marked in their signature. A trampoline solution basically involves a function returning the address of the next function to execute, and the caller having a loop to keep calling the next function.
A specific calling convention is only required for general tail calls. When the signatures match any calling convention can be used. |
I think it is a very good idea of @Amanieu to start implementing this feature as an MVP and the restriction to only tail call functions with the same signatures to make the initial implementation simpler and more feasible with the end goal of having general tail calls after some more iterations. This quickly gives us something to toy around with and be able to experiment with extensions. It is also how new features in Rust usually are developed and matured. |
Ah, fair enough. So if I understand correctly this should also apply to indirect function calls / calls to functions that are not known during compile time, as only the last function would need to clean up the stack. |
I think you could probably do something like this:
Since tail-calls are only really useful in mutually recursive situations, then it shouldn't be an issue, and it doesn't impose a cost on other code. eg. if you have functions A <-> B which both call each other, then to get the benefit of a tail call you need both A and B to use |
Tail-calls are also useful for continuation-passing style, where one will be tail-calling a trait object. |
I think any workable solution would require some annotation on the trait in that case - in this example you would have to give the trait method a "rustcall" convention. |
The other approach would be to modify LLVM to support fully general tail calls in all cases, using a trampoline if necessary. That’s out of scope for the MVP, but is necessary for the general case. |
In case someone is interested in the current state of the RFC, or wants to give some early feedback, it can be found here https://github.com/phi-go/rfcs/blob/guaranteed-tco/text/0000-guaranteed-tco.md. I hope to finish sometime next week. |
It would be great to have an issue that tracks this new RFC so that people can comment on it in a thread dedicated to the RFC. Usually those issues have links to the rendered markdown of the RFC in question. Or is the plan to recycle the old RFC tracking issue? |
RFCs are discussed in the pull request from what I can tell but I wanted to first finish an initial version before opening a PR. |
Alright, I finished up the RFC, you can find the PR here #3407. First time writing an RFC so I hope it's fine. |
Might be of interest: Blogpost about the new Wasm tail calls in the V8 engine. |
somewhat related: idea for safe computed goto by using a new enum |
It's been dead for a while, but I'd like to propose starting to think about TCE again. The discussion that I saw before was mostly centred around certain algorithms being more convenient to implement using recursion, but I'd like to give an example that's impossible to implement without TCE and that I actually genuinely do have a personal need for. To create a fast interpreter with 100% safe code, one possible implementation is like so:
This can compile to excellent assembly, with similar performance to a naive compiler modulo the worse instruction cache usage and jump for each instruction, but you can see in this toy implementation that it compiles to
call
+ret
using current Rust. With proper TCE error handling should just be the cost of creating the error plus aret
. Unfortunately, it is impossible to express this in Rust right now, since TCE is not guaranteed and there's no way to assume that it will happen in debug mode. Another issue is that recursivefn
definitions aren't allowed, but that doesn't prevent this from being written, it just prevents it from being safe.The text was updated successfully, but these errors were encountered: