Feature Name: explicit_tail_calls
Start Date: 2023-04-01
RFC PR: rust-lang/rfcs#3407
Rust Issue: rust-lang/rust#112788

Summary

While tail call elimination (TCE) is already possible via tail call optimization (TCO) in Rust, there is no way to guarantee that a stack frame must be reused. This RFC describes a language feature providing tail call elimination via the become keyword providing this guarantee. If this guarantee can not be provided by the compiler a compile time error is generated instead.

Motivation

Tail call elimination (TCE) allows stack frames to be reused. While TCE via tail call optimization (TCO) is already supported by Rust, as is normal for optimizations, TCO will only be applied if the compiler expects an improvement by doing so. However, the compiler can't have ideal analysis and thus will not always be correct in judging if an optimization should be applied. This RFC shows how TCE can be guaranteed in Rust.

The guarantee for TCE is interesting for two general goals. One goal is to do function calls without growing the stack, this mainly has semantic implications as recursive algorithms can overflow the stack without this guarantee. The other goal is to avoid paying the cost to create a new stack frame, replacing call instructions by jmp instructions, this optimization has performance implications and can provide massive speedups for algorithms that have a high density of function calls. This goal also depends on the guarantee as otherwise a subtle change or a new compiler version can have an unexpected impact on performance.

Note that workarounds for the first goal exist by using trampolining which limits the stack depth. However, while this functionality can be provided as a library, inclusion in the language can provide greater adoption of a more functional programming style.

For the second goal, TCO can have the intended effect, however, there is no guarantee. This can result in unexpected slow-downs, for example, as can be seen in this issue.

Some specific use cases that are supported by this feature are new ways to encode state machines and jump tables, code written in a continuation-passing style, ensuring recursive algorithms do not overflow the stack, and guaranteeing good code generation for interpreters. For a language like Rust that considers performance-oriented uses to be in scope, it is important to support these kinds of programs.

Examples from the C/C++ ecosystem

(This section is based on this comment, all credit goes to @traviscross.)

The C/C++ ecosystem already has access to guaranteed TCE via Clang's musttail attribute and GCC/Clang's computed goto. Based on the assumption that code which uses musttail or computed gotos would also use become in Rust, we can gauge the impact of this feature by collecting a list of example programs that would not be replicable in Rust without this RFC.

The list of programs is generated as follows:

GitHub was searched for uses of musttail and uses of computed goto. GitHub's search only returns five pages, so this is only a sampling.
The most popular projects are picked and each result is checked to confirm that musttail or computed gotos are used.
Additionally, for musttail, which was only introduced in Clang 13, projects that have comments which indicate the desire to use musttail once legacy compiler support can be dropped are included as well. (Of which, there are two: FreeRADIUS and Pyston).
Some projects use musttail (either Clang's or LLVM's) for code generation only, these are placed in a separate section. It is noted which of these projects expose guaranteed TCE to user code. (One project, Swift, uses it both internally and for code generation.)

The resulting list of notable projects using musttail:

Protobuf
Swift
Skia (a graphics library from Google)
CHERIoT RTOS (a realtime operating system with memory safety)
FreeRADIUS (a RADIUS implementation) (planning to use)
Example BPF code from a Cloudflare blog post
RSM (a virtual computer in the form of a virtual machine)
Tails (a Forth-like interpreter)
Jasmin (a stack-based byte-code interpreter)
upb (a small protobuf implementation in C)
wasm3 ("the self-proclaimed fastest WebAssembly interpreter")

The resulting list of notable projects using computed goto:

The Linux kernel
PostgreSQL
CPython
MicroPython (a lean Python implementation)
Godot (a 2D/3D game engine)
Ruby (they use it in their regex engine as well as in their interpreter)
HHVM (a PHP implementation from Facebook)

The resulting list of notable projects using musttail for code generation:

Swift
Zig (+ guaranteed TCE exposed to user code)
GHC (+ guaranteed TCE exposed to user code)
Clang (+ guaranteed TCE exposed to user code)
Julia
Firefly (a BEAM/Erlang implementation) (+ guaranteed TCE exposed to user code)
MLton (a Standard ML compiler) (+ guaranteed TCE exposed to user code)
Pyston (a performance-optimizing JIT for Python) (planning to use)

Guide-level explanation

Pretending this RFC has already been accepted into Rust, it could be explained to another Rust programmer as follows.

Tail Call Elimination

Rust supports a way to guarantee tail call elimination (TCE) for function calls using the become keyword. If TCE is requested for a call, and all requirements are fulfilled, the called function will reuse the stack frame of the calling function. The requirements, described in detail below, are checked by the compiler and a compiler error will be raised if they are not met. Note that TCE can opportunistically also be performed by Rust using tail call optimization (TCO), this will cause TCE to be used if it is deemed to be "better" (as in faster, or smaller if optimizing for space).

TCE is interesting for two groups of programmers: Those that want to use recursive algorithms, which can overflow the stack if the stack frame is not reused; and those that want to create highly optimized code as creating new stack frames can be expensive.

The become keyword can be thought of similarly as return as both keywords act as the end of the current function. The main difference is that the argument to become needs to be a function call. However, there are several requirements on the called function which need to be fulfilled for TCE to be guaranteed, these are checked by the compiler.

The main restriction is that the argument to become is a tail call, a call that is the last action performed in the function. Supported are calls such as become foo(), become foo(a), become foo(a, b), become foo(1 + 1), become foo(bar()), become foo.method(), or become function_table[idx](arg). Calls that are not in the tail position can not be used, for example, become foo() + 1 is not allowed. In the example, the function would need to be evaluated and then the addition would need to take place.

A further restriction is on the function signature of the caller and callee. The stack frame layout is based on the calling convention, arguments, as well as return types (the function signature in short). As the stack frame is to be reused it needs to be similar enough for both functions. This requires that the function signature and calling convention of the calling and called function need to match exactly.

Additionally, there is a further restriction on the arguments. As the stack frame of the calling function is reused, it needs to be cleaned up, so that the called function can take the space. This is nearly identical to the clean up that happens when returning from a function, all local variables, that are not returned or in the case of become used in the function call, are dropped. For become, however, dropping necessarily happens before entering the called function. As a result, it is not possible to pass references to local variables, nor will the called function "return" to the calling function.

If any of these restrictions are not met when using become a compilation error is thrown.

Note that using this feature can make debugging more difficult. As become causes the stack frame to be reused, debugging context is lost. Expect to no longer see any parent functions that used become in the stack trace, or have access to their variable values while debugging.

As this feature is strictly opt-in and the become keyword is already reserved, this has no impact on existing code.

Teaching

For new Rust programmers this feature should probably be introduced late into the learning process, it requires understanding some advanced concepts and the current use cases are likely to be niche. So it should be taught similarly as to programmers that already know Rust.

Examples

On to some examples. Starting with how return and become differ, two example use cases, and some potential pitfalls.

The difference between `return` and `become`

The difference to return is that become drops function local variables before the become function call instead of after. To be more specific a become expression acts as if the following events occurred in-order:

Function call arguments are evaluated into temporary storage. If a local variable is used as a value in the arguments, it is moved.
All local variables in the caller are destroyed according to usual Rust semantics. Destructors are called where necessary. Note that values moved from step 1 are not dropped.
The caller's stack frame is removed from the stack.
Control is transferred to the callee's entry point.

This implies that it is invalid for any references into the caller's stack frame to outlive the call. The borrow checker ensures that none of the above steps will result in the use of a value that has gone out of scope.

See the example below on how become causes drops to be elaborated.

fn x(_arg_zero: Box<()>, _arg_one: ()) {
    let a = Box::new(());
    let b = Box::new(());
    let c = Box::new(());

    become y(a, foo(b));
}

The drops will be elaborated by the compiler like this:

fn x(_arg_zero: Box<()>, _arg_one: ()) {
    let a = Box::new(());
    let b = Box::new(());
    let c = Box::new(());

    // Move become arguments to temporary variables.
    let function_ptr = y; // The function pointer could be the result of an expression like: fn_list[fn_idx];
    let tmp_arg0 = a;
    let tmp_arg1 = foo(b);

    // End of the function, all variables not used in the `become` call are dropped, as would be done after a `return`.
    // Return value of foo() is *not* dropped as it is moved in the become call to y().
    drop(c);
    // `b` is *not* dropped because it is moved due to the call to foo().
    // `a` is *not* dropped as it is used in the become call to y().
    drop(_arg_one);
    drop(_arg_zero);

    // Finally, `become` the called function.
    become function_ptr(tmp_arg0, tmp_arg1);
}

If we used return instead, the drops would happen after the call:

fn x(_arg_zero: Box<()>, _arg_one: ()) {
    let a = Box::new(());
    let b = Box::new(());
    let c = Box::new(());
    return y(a, foo(b));
    // Normal drop order:
    // Return value of foo() is *not* dropped as it is moved in the call to y().
    // drop(c);
    // `b` is *not* dropped because it is moved due to the call to foo().
    // `a` is *not* dropped because it is moved to the callee y().
    // drop(_arg_one);
    // drop(_arg_zero);
}

This early dropping allows the compiler to avoid many complexities associated with deciding if the stack frame can be reused. Instead, the heavy lifting is done by the borrow checker, which will produce a lifetime error if references to local variables are passed to the called function. This is distinct from return, which does allow references to local variables to be passed. Indeed, this difference in the handling of local variables is also the main difference between return and become.

Use Case 1: Recursive Algorithm

A simple example is the following algorithm for summing the elements of a slice. While this would usually be done with iteration in Rust, this example illustrates a simple use of become. Without guaranteed TCE, this example could overflow the stack if TCO is not applied.

fn sum_slice(data: &[u64], accumulator: u64) -> u64 {
    match data {
        [first, rest @ ..] => become sum_slice(rest, accumulator + first),
        [] => accumulator,
    }
}

Use Case 2: Interpreter

For an interpreter, the usual loop is to get an instruction, match on that instruction to find the corresponding function, call that function, and finally return to the loop to get the next instruction. (This is a simplified example.)

fn exec_instruction(mut self) {
    loop {
        let next_instruction = self.read_instr();
        match next_instruction {
            Instruction::Foo => self.execute_instruction_foo(),
            Instruction::Bar => self.execute_instruction_bar(),
        }
    }
}

This example can be turned into the following code, which no longer does any calls and instead just uses jump instructions. (Note that this example might not be the optimal way to use become.)

fn execute_instruction_foo(mut self) {
    // foo things ...

    become self.next_instruction();
}

fn execute_instruction_bar(mut self) {
    // bar things ...

    become self.next_instruction();
}

fn next_instruction(mut self) {
    let next_instruction = self.read_instr();
    match next_instruction {
        Instruction::Foo => become self.execute_instruction_foo(),
        Instruction::Bar => become self.execute_instruction_bar(),
    }
}

Pitfall 1: Function calls as arguments are not tail call eliminated.

(original example)

The guarantee of TCE is only provided to the function call that is an argument to become, it is not given to calls that are arguments, see the following example:

fn add(a: u64, b: u64) -> u64 {
    a + b
}

pub fn calc(a: u64, b: u64) -> u64 {
    if a < b {
        return a
    }

    let n = a - b;
    become add(calc(n, 2), calc(n, 1));
}

In this example become will guarantee TCE only for the call to add() but not for the calc() calls. Running this code will likely end up in a stack overflow as the recursive calls are to calc() which are not TCE'd.

Pitfall 2: Omission of the `become` keyword causes the call to be `return` instead.

(original example)

This is a potential source of confusion, indeed in a functional language where calls are expected to be TCE this would be quite unexpected. (Maybe in functions that use become a lint should be applied that enforces usage of either return or become in functions where at least one become is used.)

fn foo(x: i32) -> i32 {
    if x % 2 {
        let x = x / 2;
        // one branch uses `become`
        become foo(x);
    } else {
        let x = x + 3;
        // the other does not
        foo(x) // == return foo(x);
    }
}

Pitfall 3: Alternating `become` and `return` calls still grows the stack.

(original example)

Here one function uses become the other return, this is another potential source of confusion. This mutual recursion would eventually overflow the stack. As mutual recursion can also happen across more functions, become needs to be used consistently in all functions if TCE is desired. (Maybe it is also possible to create a lint for these use cases as well.)

fn foo(n: i32) {
    // oops, we forgot become ..
    return bar(n); // or alternatively: `bar(n)`
}

fn bar(n: i32) {
    become foo(n);
}

Pitfall 4: Caller location aka `panic::Location::caller` and `#[track_caller]`

Since #[track_caller] adds an argument, #[track_caller] functions can only tail call other #[track_caller] functions. So the following example will not compile.

use std::panic::Location;

/// Returns the [`Location`] at which it is called.
#[track_caller]
fn get_caller_location() -> &'static Location<'static> {
    Location::caller()
}

/// Returns a [`Location`] from within this function's definition.
fn get_just_one_location() -> &'static Location<'static> {
    become get_caller_location();
}

Reference-level explanation

Implementation of this feature requires checks that all prerequisites to guarantee TCE are fulfilled. These checks are:

The become keyword is only used in place of return. The intent is to reuse the semantics of a return signifying "the end of a function". See the section on tail-call-elimination for examples.
The argument to become is a function (or method) call, that exactly matches the function signature and calling convention of the callee. The intent is to ensure a matching ABI. Note that lifetimes may differ as long as they pass borrow checking, see below for specifics on the return type.
The stack frame of the calling function is reused, this also implies that the function is never returned to. The required checks to ensure this is possible are: no borrows of local variables are passed to the called function (passing local variables by copy/move is ok since that doesn't require the local variable to continue existing after the call), and no further cleanup is necessary. These checks can be done by using the borrow checker as already described in the section showing the difference between return and become above.
The restrictions, caused by interactions with other features, are followed. See below for details, the restrictions mostly concern caller context and callee signatures.

If any of these checks fail a compiler error is issued. It is also suggested to ensure that the invariants provided by the pre-requisites are maintained during compilation, raising an ICE if this is not the case.

One additional check must be done, if the backend cannot guarantee that TCE will be performed an ICE is issued. To be specific the backend is required that: "A tail call will not cause unbounded stack growth if it is part of a recursive cycle in the call graph".

The type of the expression become <call> is ! (the never type, see here). This is consistent with other control flow constructs such as return, which also have the type of !.

Note that as become is a keyword reserved for exactly the use-case described in this RFC there is no backward-compatibility break. This RFC only specifies the use of become inside of functions and instead leaves usage outside of functions unspecfied for use by other features.

This feature will have interactions with other features that depend on stack frames, for example, debugging and backtraces. See drawbacks for further discussion.

See below for specifics on interactions with other features.

Coercions of the Tail Called Function's Return Type

All coercions that do any work (like deref coercion, unsize coercion, etc) are prohibited. Lifetime-shortening coercions (&'static T -> &'a T) are allowed but will be checked by the borrow checker.

Reference/pointer coercions of the return type are not supported to minimize implementation effort. Though, coercions which don't change the pointee (&mut T -> &T, *mut T -> *const T, &T -> *const T, &mut T -> *mut T) could be added in the future.

Never-to-any coercions (! -> T) of the return type are not supported to minimize implementation effort. They are difficult to implement and require backend support. To be clear, this only concerns functions that have the never return type like the following example:

fn never() -> ! {
    loop {}
}

fn tail_call_never_type() -> usize {
    become never(); //~ error: mismatched types
}

Closures

Tail calling closures and tail calling from closures is not allowed. This is due to the high implementation effort, see below, this restriction can be lifted by a future RFC.

Closures use the rust-call unstable calling convention, which would need to be adapted to guarantee TCE. Additionally, any closure that has captures would need special handling, since the captures would currently be dropped before the tail call.

Variadic functions using `c_variadic`

Tail calling variadic functions and tail calling from variadic functions is not allowed. As support for variadic function is stabilized on a per target level, support for tail-calls regarding variadic functions would need to follow a similar approach. To avoid this complexity and to minimize implementation effort for backends, this interaction is currently not allowed but support can be added with a future RFC.

Generators

Tail calling from generators is not allowed. As the generator state is stored internally, tail calling from the generator function would require additional support to function correctly. To limit implementation effort this is not supported but can be supported by a future RFC.

Async

Tail calling from async functions or async blocks is not allowed. This is due to the high implementation effort as it requires special handling for the async state machine. This restriction can be relaxed by a future RFC.

Using become on a .await expression, such as become f().await, is also not allowed. This is because become requires a function call and .await is not a function call, but is a special construct.

Note that tail calling async functions from sync code is possible but the return type for async functions is impl Future, which is unlikely to be interesting.

Operators are not supported

Invocations of operators were considered as valid targets but were rejected on grounds of being too error-prone. In any case, these can still be called as methods. One example of their error-prone nature (source):

pub fn fibonacci(n: u64) -> u64 {
    if n < 2 {
        return n
    }
    become fibonacci(n - 2) + fibonacci(n - 1)
}

In this case, a naive author might assume that this is going to be a stack space-efficient implementation since it uses tail recursion instead of normal recursion. However, the outcome is more or less the same since the critical recursive calls are not actually in tail call position.

Further confusion could result from the same-signature restriction where the Rust compiler raises an error since fibonacci and <u64 as Add>::add do not share a common signature.

Drawbacks

As this feature should be mostly independent of other features the main drawback lies in the implementation and maintenance effort. This feature adds a new keyword which will need to be implemented not only in Rust but also in other tooling. The primary effort, however, lies in supporting this feature in the backends:

LLVM supports a musttail marker to indicate that TCE should be performed docs. Clang which already depends on this feature seems to only generate correct code for the x86 backend source (as of 2023-03-30).
GCC does seem to support an equivalent musttail marker, though it is only accessible via the libgccjit API (source).
For WebAssembly see the discussion here. While general tail calls seem far off, supporting only matching function signatures seems more feasible. Also note that WebAssembly accepted tail calls into the standard and Cranelift is now working towards supporting it.

Additionally, this proposal is limited to exactly matching function signatures which will not allow general tail-calls, however, the work towards this initial version is likely to be useful for a more comprehensive version.

There is also an unwanted interaction between TCE and debugging. As TCE by design elides stack frames this information is lost during debugging, that is the parent functions and their local variable values are incomplete. As TCE provides a semantic guarantee of constant stack usage it is also not generally possible to disable TCE for debugging builds as then the stack could overflow. (Still, maybe a compiler flag could be provided to temporarily disable TCE for debugging builds. As suggested here, another option would be special support for become by a debugger. With this support the debugger would keep track of the N most recent calls providing at least some context to the bug.)

Rationale and alternatives

In this section, the reason for choosing the design is discussed as well as possible alternatives that have been considered.

Why is this design the best in the space of possible designs?

Of all possible alternatives, this design best fits the tradeoff between implementation effort and functionality while also offering a starting point toward further exploration of a more general implementation. Regarding implementation effort, this design requires the least of backends while not already implementable via a library, see here. Regarding functionality, the proposed design requires function signatures to match, however, this restriction still allows tail calls between functions that use different arguments. This can be done by requiring the programmer to add (unused) arguments to both function definitions so that they match. Additionally, the chosen design allows tail calls for dynamic calls and other variations.

Creating a Function Local Scope

Using the become keyword creates a function local scope that drops all variables not used in the tail call, as would be done for return. There is no alternative to this approach as the stack frame needs to be prepared for the tail call.

Backend Requirements

The main hurdle to implementing this feature is the required work to be done by the backends (e.g. LLVM). See the following list for an overview of approaches that have been considered for this RFC, going by increasing demand on the backends:

Internal Transformation - Use a MIR transformation to implement tail calls without any backend requirements, possible implementations are: Defunctionalization and Trampolines. However, all proposed options can only support static function calls. This is one reason this option is not chosen, as dynamic function calls seem too important to ignore, see here. Another reason is that this approach can already be done by libraries.
Matching Function Signatures (this RFC) - Require that the caller and callee have matching function signatures. With this restriction, it is possible to do tail calls regardless of the calling convention used, see the next point for why this is important. Though the calling convention needs to match between caller and callee. All that needs to be done by the backends is to overwrite the arguments in place with the values for the tail call. By requiring a matching calling convention and function signature between caller and callee the ABI is guaranteed to match as well. This is also quite similar to how musttail is used in practice for Clang: "Implementor experience with Clang shows that the ABI of the caller and callee must be identical for the feature to work [...]" (see here).
Mark Function Definition - One hurdle to guaranteeing tail calls is that the default calling conventions used by backends usually do not support tail calls and instead another calling convention needs to be used. As it is unreasonable to expect changing the default calling convention, one option is to mark functions that should use a calling convention amenable to tail calls. This requires that backends can support tail calls when allowed to change the calling convention. This requirement, however, already seems quite difficult to establish. For example, this thread discusses why this approach is not reasonable.
Backend Specific - Depend on the backend to decide if a tail call can be performed. While this approach allows gradual advancements it also seems the most unstable and difficult to use.

As described by the list of approaches above, this RFC specifies the approach that is most attainable and still useful in practice while not already implementable via a library.

What should be Tail Callable

Tail calls can be implemented without backend support if only static calls are supported, see the following list for reasons why other calls should be supported:

Dynamic Calls One example that depends on dynamic tail calls is a C implementation of a Protobuf parser, see here.
Calls across Crates This will allow tail calls to library functions, enabling libraries to support code that requires constant stack usage or make calls more performant.
Calls to Dynamically Loaded Functions As an example this would be useful to improve performance for an emulator that uses a JIT.

What other designs have been considered and what is the rationale for not choosing them?

There are some designs that either can not achieve the same performance or functionality as the chosen approach. Though most other designs evolve around how to mark what should be a tail-call or marking what functions can be tail called. There is also the possibility of providing support for a custom backend (e.g. LLVM) or MIR pass.

Rust Built-in Functionality

For simple tail recursion on an iterable, successors can be used to compute a result for each element. Nearly equivalently, a (combinator)[https://users.rust-lang.org/t/when-will-rust-have-tco-tce/20790/3] can be used to express a tail recursive function, however, allowing more flexibility regarding the returned result.

These approaches do not provide a way to express general tail calls, so do not fulfill a basic requirement we would like to achieve.

Trampoline based Approach

There could be a trampoline-based approach (comment) that can fulfill the semantic guarantee of using constant stack space, though they can not be used to achieve the performance that the chosen design is capable of.

Similarly, as mentioned here, an approach used by Chicken Scheme is to do normal calls and handle stack overflows by cleaning up the stack.

Principled Local Goto

One alternative would be to support some kind of local goto natively, indeed there exists a pre-RFC (comment) or another (LLVM specific idea)[https://internals.rust-lang.org/t/idea-for-safe-computed-goto-using-enums/21787?u=programmerjake]. These designs should be able to achieve the targeted performance and stack usage, though they seem to be quite difficult to implement or are backend specific. Also they do not seem to be as flexible as the chosen design, especially implementing tail calls to indirect or external functions do not seem to be feasible.

Attribute on Function Declaration

One alternative is to mark a group of functions that should be mutually tail-callable example with some follow-up discussion.

The goal behind this design is to allow TCE of functions that do not have exactly matching function signatures, in theory, this just requires that tail-called functions are callee cleanup, which is a mismatch to the default calling convention used by Rust. To limit the impact of this change all functions that should be TCE-able should be marked with an attribute.

While quite noisy it is also less flexible than the chosen approach. Indeed, TCE is a property of the call and not a function definition, sometimes a call should be guaranteed to be TCE, and sometimes not, marking a function would be less flexible.

Adding a mark to `return`

The return keyword could be marked using an attribute or an extra keyword as in the example below.

fn a() {
    // The chosen variant.
    become b();

    // Using an attribute.
    #[become]
    return b();

    // Adding an extra keyword.
    return become b();
}

These alternatives mostly come down to personal taste (or bikeshedding) and the plain keyword become was chosen because of the following reasons:

It is reserved exactly this use case.
It is shorter to write.
The behavior changes in subtle ways compared to a plain return. To clearly indicate this change in behavior a stronger distinction from return than adding a mark seems warranted.
- TCE as proposed in this RFC requires dropping local variables before the function call instead of after with return.
- From a type system perspective the type of the return expression (!, the never type, see here for an example) stays the same even when adding one of the markings. This means that type-checking can not help if the marking is forgotten or added mistakenly. (Note that the argument, the function call, can still be type checked, just not the return expression.)

Require Explicit Dropping of Variables

(Based on this comment)

An alternative approach could be to refuse to compile functions that would need to run destructors before become. This would force code to not rely on implicit drops and require calls to drop(variable), as in the following example:

fn f(x: String) {
    drop(x); // necessary
    become g();
}

This approach would result in more verbose code but would also be easier to read for people not familiar with tail calls. Also, it is forwards compatible with implicit dropping before become.

The reason this approach is not chosen is that the tradeoff between increased verbosity and the reduction of initial learning time seems to not be worth it. Additionally, implementing the diagnostic for forgotten drops can be expected to be more effort than for correct drop elaboration.

Custom Compiler or MIR Passes

One more distant alternative would be to support a custom compiler or MIR pass so that this optimization can be done externally. While supported for LLVM Zulip, for MIR this is not supported discussion.

This would be an error-prone and unergonomic approach to solving this problem.

What is the impact of not doing this?

Rust's goal is to empower everyone to build reliable and efficient software. (source)

This feature provides a crucial optimization for some low-level code. It seems that without this feature there is a big incentive for developers of those specific applications to use other system-level languages that can guarantee TCE.

Additionally, this feature enables recursive algorithms that require TCE, which would provide better support for functional programming in Rust.

If this is a language proposal, could this be done in a library or macro instead? Does the proposed change make Rust code easier or harder to read, understand, and maintain?

While there exist libraries for a trampoline-based method to avoid growing the stack, this is not enough to achieve the possible performance of real TCE, so this feature requires support from the compiler itself.

Prior art

Functional languages (such as OCaml, SML, Haskell, Scheme, and F#) usually depend on proper tail calls as a language feature (TCE for general calls). For system-level languages, TCE is usually wanted but implementation effort is a common reason this is not yet done. Even languages with managed code such as .Net or ECMAScript (as per the standard) also support TCE, again performance and resource usage were the main motivators for their implementation.

See below for a more detailed description of select compilers and languages.

Clang

Clang, as of April 2021, does offer support for a musttail attribute on return statements in both C and C++. This functionality is enabled by the support in LLVM, which should also be the first backend for an initial implementation in Rust.

It seems this feature is received with "excitement" by those that can make use of it, a popular example of its usage is to improve Protobuf parsing speed. However, one issue is that it is not very portable and there still seems to be some problem with the implementation.

For a more detailed description see this excerpt from the description of the feature, taken from the implementation:

Guaranteed tail calls are now supported with statement attributes [[clang::musttail]] in C++ and __attribute__((musttail)) in C. The attribute is applied to a return statement (not a function declaration), and an error is emitted if a tail call cannot be guaranteed, for example if the function signatures of caller and callee are not compatible. Guaranteed tail calls enable a class of algorithms that would otherwise use an arbitrary amount of stack space.

If a return statement is marked musttail, this indicates that the compiler must generate a tail call for the program to be correct, even when optimizations are disabled. This guarantees that the call will not cause unbounded stack growth if it is part of a recursive cycle in the call graph.

If the callee is a virtual function that is implemented by a thunk, there is no guarantee in general that the thunk tail-calls the implementation of the virtual function, so such a call in a recursive cycle can still result in unbounded stack growth.

clang::musttail can only be applied to a return statement whose value is the result of a function call (even functions returning void must use return, although no value is returned). The target function must have the same number of arguments as the caller. The types of the return value and all arguments must be similar according to C++ rules (differing only in cv qualifiers or array size), including the implicit "this" argument, if any. Any variables in scope, including all arguments to the function and the return value must be trivially destructible. The calling convention of the caller and callee must match, and they must not be variadic functions or have old style K&R C function declarations.

There is also a proposal for the C Standard outlining some limitations for Clang.

Clang requires the argument types, argument number, and return type to be the same between the caller and the callee, as well as out-of-scope considerations such as C++ features and the calling convention. Implementor experience with Clang shows that the ABI of the caller and callee must be identical for the feature to work; otherwise, replacement may be impossible for some targets and conventions (replacing a differing argument list is non-trivial on some platforms).

GCC

GCC does not support a feature equivalent to Clang's musttail, there also does not seem to be a push to implement it (pipermail) (as of 2021). However, there also exists an experimental plugin for GCC last updated in 2021.

Zig

Zig provides separate syntax to allow more flexibility than normal function calls. There are options for async calls, inlining, compile-time evaluation of the called function, or specifying TCE on the call. (source)

The following is an example taken from here (https://zig.godbolt.org/z/v13vrjxG4, a toy lexer using tail calls in Zig):

export fn lex(data: *Data) callconv(.C) u32
{
    if(data.cursor >= data.input.len)
        return data.tokens;
    switch(data.input[data.cursor]) {
        'a' => return @call(.always_tail, lex_a, .{data}),
        'b' => return @call(.always_tail, lex_b, .{data}),
        else => return @call(.always_tail, lex_err, .{data}),
    }
}

Carbon

As per this issue it seems providing TCE is of interest even if the implementation is difficult.

.Net

The .Net JIT does support TCE as of 2020, a main motivator for this feature was improving performance. Pull Request (Issue)

This implements tailcall-via-help support for all platforms supported by the runtime. In this new mechanism the JIT asks the runtime for help whenever it realizes it will need a helper to perform a tailcall, i.e. when it sees an explicit tail. prefixed call that it cannot make into a fast jump-based tailcall.

ECMA Script / JS

rust-lang#1888 (comment) (Feb, 2018)

Technically the ES6 spec mandates tail-calls, but the situation in reality is more complicated than that.

The only browser that actually supports tail calls is Safari (and Webkit). And the Edge team has said that it's unlikely that they will implement tail calls (for similar reasons as Rust: they currently use the Windows ABI calling convention, which doesn't work well with tail calls).

Therefore, tail calls in JS is a very controversial thing, even to this day

Just to be clear, the Edge team is against implicit tail-calls for all functions, but they're in favor of tail-calls-with-an-explicit-keyword (similar to this RFC).

An unofficial summary of the ECMA Script/ Javascript proposal for tail call/return carbon-language/carbon-lang#1761 (comment) (Jul, 2022)

Unresolved questions

What parts of the design do you expect to resolve through the RFC process before this gets merged?
- One point that needs to be decided is if TCE should be a feature that needs to be required from all backends or if it can be optional. Currently, the RFC specifies that an ICE should be issued if a backend cannot guarantee that TCE will be performed.
What parts of the design do you expect to resolve through the implementation of this feature before stabilization?
- Are all calling-convention used by Rust available for TCE with the proposed restrictions on function signatures?
- Is there some way to reduce the impact on debugging and other features?
What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?
- Supporting general tail calls, the current RFC restricts function signatures which can be loosened independently in the future.

Resolved Questions

Can generic functions be supported?
- As Rust uses Monomorphization, generic functions are not a problem.
Can dynamic function calls be supported?
- Dynamic function calls are supported (confirmation).
Can functions outside the current crate be supported, functions from dynamically loaded libraries?
- Same as dynamic function calls these function calls are supported (confirmed for LLVM).
Can closures be supported?
- Closures are not supported see here.
Can async functions be supported?
- Async functions are not supported see here.
Should "performance" be guaranteed by the backends?
- "Performance" is not guaranteed by the backends, see here.

Future possibilities

Lints

The functionality introduced by RFC also has possible pitfalls, it is likely worthwhile to provide lints that warn of these issues. See the discussion here for possible lints.

Additionally, there can be another class of lints, those that guide migration to using become. For example, provide a lint that indicates if a trivial transformation from return to become can be done for function calls where all requisites are already fulfilled. Note that, this lint might be confusing and noisy.

Helpers

It seems possible to keep the restriction on exactly matching function signatures by offering some kind of placeholder arguments to pad out the differences. For example:

fn foo(a: u32, b: u32) {
    // uses `a` and `b`
}

fn bar(a: u32, _b: u32) {
    // only uses `a`
}

Maybe it is useful to provide a macro or attribute that inserts missing arguments.

#[pad_args(foo)]
fn bar(a: u32) {
    // ...
}

Relaxing the Requirement of Strictly Matching Function Signatures for Static Calls

It should be possible to automatically pad the arguments of static tail calls, similar to the helpers section above. See this comment for details. Note that this approach does not relax requirements for dynamic calls.

Relaxing the Requirement of Strictly Matching Function Signatures with a new Calling Convention

In the future, a calling convention could be added to allow become to be used with functions that have mismatched function signatures. This approach is close to the alternative of adding a marker to the function declaration. Same as the alternative, a requirement needs to be added that backends provide a calling convention that support tail calling.

Mismatches in Mutability

Mismatches in mutability (like &T <-> &mut T) for arguments and return type of the function signatures are currently not supported as they are different types. However, this mismatch could be supported if there is a guarantee that mutability has no effect on ABI. For more details, see here.

Performance Guarantee

First of all, performance is ambiguous. As the stand in we use the requirement that no new stack frame is created for a tail call. The reason for this choice is that creating a new stack frame can be a large part of computation time in hot loops that do calls, this is a code construct that can likely be optimized with tail calls.

Can the requirement to not create new stack frames when using tail calls be imposed on backends? This answer seems to be no, even for LLVM. LLVM provides some guarantees for tail calls, however, none do ensure that no new stack frame is created (as of 24-05-2023).

If it turns out that in practice the no new stack frame requirement is not already kept it might be worthwhile to revisit this performance requirement.

Functional Programming

This might be wishful thinking but if TCE is supported there could be further language extensions to make Rust more attractive for functional programming paradigms. Though it is unclear to me how far this should be taken or what changes exactly would be a benefit.

Files

0000-explicit-tail-calls.md

Latest commit

History

0000-explicit-tail-calls.md

File metadata and controls

Summary

Motivation

Examples from the C/C++ ecosystem

Guide-level explanation

Tail Call Elimination

Teaching

Examples

The difference between return and become

Use Case 1: Recursive Algorithm

Use Case 2: Interpreter

Pitfall 1: Function calls as arguments are not tail call eliminated.

Pitfall 2: Omission of the become keyword causes the call to be return instead.

Pitfall 3: Alternating become and return calls still grows the stack.

Pitfall 4: Caller location aka panic::Location::caller and #[track_caller]

Reference-level explanation

Coercions of the Tail Called Function's Return Type

Closures

Variadic functions using c_variadic

Generators

Async

Operators are not supported

Drawbacks

Rationale and alternatives

Why is this design the best in the space of possible designs?

Creating a Function Local Scope

Backend Requirements

What should be Tail Callable

What other designs have been considered and what is the rationale for not choosing them?

Rust Built-in Functionality

Trampoline based Approach

Principled Local Goto

Attribute on Function Declaration

Adding a mark to return

Require Explicit Dropping of Variables

Custom Compiler or MIR Passes

What is the impact of not doing this?

If this is a language proposal, could this be done in a library or macro instead? Does the proposed change make Rust code easier or harder to read, understand, and maintain?

Prior art

Clang

GCC

Zig

Carbon

.Net

ECMA Script / JS

Unresolved questions

Resolved Questions

Future possibilities

Lints

Helpers

Relaxing the Requirement of Strictly Matching Function Signatures for Static Calls

Relaxing the Requirement of Strictly Matching Function Signatures with a new Calling Convention

Mismatches in Mutability

Performance Guarantee

Functional Programming

The difference between `return` and `become`

Pitfall 2: Omission of the `become` keyword causes the call to be `return` instead.

Pitfall 3: Alternating `become` and `return` calls still grows the stack.

Pitfall 4: Caller location aka `panic::Location::caller` and `#[track_caller]`

Variadic functions using `c_variadic`

Adding a mark to `return`