Inlined function duplication across complex branches when extern "Rust"
is used with LTO and opt-level="s"
#102295
Labels
A-LTO
Area: Link-time optimization (LTO)
I-heavy
Issue: Problems and improvements with respect to binary size of generated code.
O-Arm
Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state
O-msp430
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
Context
The example code I linked/described here is an MCVE. See Background For "Real" Applications section for details.
free(f)
within itsmain()
.free()
takes a closuref
with a branch (?
) as input, and in turn callsf
and then a function calledrelease()
.use-extern-cs
. When disabled, the body of bothfree()
andrelease()
are provided by an external crate calledcritical
. When enabled, thefree()
function is provided by the main binary instead ofcritical
, and therelease()
function is marked asextern "Rust"
in the main binary's source file.critical
crate, therelease()
function may or may not be marked as#[inline]
. This is controlled by thecritical/inline
feature.Instructions
If testing
msp430
, make sure themsp430-elf-gcc
toolchain is installed. Optionally installjust
for convenience.git clone https://github.com/cr1901/msp430-size
. Use commit b8ef905 specifically.Despite the name of the repo, this code works for
thumbv6m-none-eabi
as well; the behavior appears to be arch-agnostic.Make sure a nightly Rust toolchain is installed (for
-Zbuild-std=core
).Run the following command:
where:
$TARGET
: eithermsp430-none-elf
orthumbv6m-none-eabi
.$FEATURES
: empty,use-extern-cs
,critical/inline
, oruse-extern-cs,critical/inline
Examine the output LLVM, assembly, and object/ELF files with
objdump
and look for a series of tennop
s once or multiple times. Eachnop
sled represents a call torelease
.Expected Behavior
The body of
release
appears once for the single call tofree()
, regardless of which combinations of features are enabled (including none).Actual Behavior
The body of
release
appears twice in the single call tofree()
for all combinations of features, except for--features=critical/inline
.Other Hints
#[inline]
attribute to preventrelease
's body from being duplicated. However, I could not translate this behavior well from my real application to MCVE. One way that I found works is to remove theextern "Rust" fn release()
declaration, and paste thecritical::internal::release()
impl directly in the main source file.extern "Rust"
declaration seems to prevent#[inline]
hints from working at all.rustc
decides to duplicaterelease
, sometimesrustc
will inline one call ofrelease
intofree
, but not the other.release
duplication appears in the LLVM files emitted byrustc
.Background For "Real" Applications
The embedded Rust community has started to standardize around a pluggable
critical-section
crate. Thecritical-section
crate by necessity marks some functions asextern "Rust"
and defers to other crates to define them. Specifically, thecritical_section::free(f)
function takes a closuref()
and calls in order (args omitted):extern "Rust" acquire()
f()
extern "Rust" release()
The crate doesn't define any new functionality for embedded Rust applications; it rather changes how existing functionality (critical sections) is implemented. In principle, the crate should be drop-in to existing embedded Rust applications.
When I transitioned some embedded Rust code to use the
critical-section
crate, I noticed marked size increases in the.text
section (1992 bytes => 2048+ bytes- no longer fits) due to new overhead from howcritical_section::free(f)
is inlined in my main application's functions. Specifically, if the closuref
tocritical_section::free(f)
has a sufficiently complex branch,rustc
will duplicate the body ofrelease
across both sides of the branch, even whenlto="fat"
andopt-level="s"
.Calling
critical_section::free()
is essential for sharing non-atomic data between interrupts/threads in a bare-metal application. To minimize interrupt latency/maximize the amount of work that can be done, the size/speed overhead these calls should be kept as small as possible. I don't understand why Rust is unable to inline calls tocritical_section::free(f)
without duplicating the body ofrelease
(whenlto="fat"
andcodegen-units=1
is enabled), regardless ofthe following scenarios:
acquire()
,release()
, andfree()
are all provided inline by the main binary.acquire()
,release()
, andfree()
are all provided by the same crate (viause
statements noextern "Rust"
).free()
is provided by one crate (viause
),acquire()
andrelease()
are provided by another (viause
).free()
is provided by one crate (viause
),extern "Rust" acquire()
andextern "Rust" release()
are provided by another crate.For the MCVE the body of
release
is exaggerated; actual size difference will vary depending on application. From my own testing, realthumbv6m-none-eabi
applications have the duplication, but on average are affected less thanmsp430-none-elf
.The text was updated successfully, but these errors were encountered: