Segmentation fault when formatting u128 on aarch64 GNU/Linux #102196

prestontimmons · 2022-09-23T15:33:35Z

Hello, we've noticed segmentation faults when running Rust binaries compiled on aarch64 GNU/Linux. We've seen this occur in multiple libraries that format or print SystemTime.

Architecture:

uname -a

5.10.135-122.509.amzn2.aarch64 #1 SMP Thu Aug 11 22:41:14 UTC 2022 aarch64 GNU/Linux

Reproducible example:

fn main() {
    let millis: u128 = 87329875;
    println!("{}", millis);
}

The segmentation fault occurs when fmt_u128 is called.

I tested this on 1.62.0 and nightly:

/builds/scratch# cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.23s
     Running `target/debug/scratch`
Segmentation fault (core dumped)

# rustc --version --verbose
rustc 1.62.0 (a8314ef7d 2022-06-27)
binary: rustc
commit-hash: a8314ef7d0ec7b75c336af2c9857bfaf43002bfc
commit-date: 2022-06-27
host: aarch64-unknown-linux-gnu
release: 1.62.0
LLVM version: 14.0.5

# cargo +nightly run
    Finished dev [unoptimized + debuginfo] target(s) in 0.24s
     Running `target/debug/scratch`
Segmentation fault (core dumped)

# rustc +nightly --version --verbose
rustc 1.66.0-nightly (e7119a030 2022-09-22)
binary: rustc
commit-hash: e7119a0300b87a3d670408ee8e847c6821b3ae80
commit-date: 2022-09-22
host: aarch64-unknown-linux-gnu
release: 1.66.0-nightly
LLVM version: 15.0.0

The segmentation fault does not occur in release mode:

# cargo run --release
    Finished release [optimized] target(s) in 0.23s
     Running `target/release/scratch`
87329875

It also does not occur if opt-level is set to greater than 0:

[profile.dev]
opt-level = 1

It also does not occur on Darwin aarch64:

uname -a

Darwin TC-4000660 21.6.0 Darwin Kernel Version 21.6.0: Wed Aug 10 14:28:23 PDT 2022; root:xnu-8020.141.5~2/RELEASE_ARM64_T6000 arm64

Meta

Valgrind traceback:

Backtrace

# valgrind target/debug/scratch
==5157== Memcheck, a memory error detector
==5157== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==5157== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==5157== Command: target/debug/scratch
==5157== 
==5157== Invalid read of size 4
==5157==    at 0x112474: alternate (mod.rs:1893)
==5157==    by 0x112474: core::fmt::Formatter::pad_integral (mod.rs:1366)
==5157==    by 0x111BBB: core::fmt::num::fmt_u128 (num.rs:641)
==5157==    by 0x112347: core::fmt::write (mod.rs:1202)
==5157==    by 0x15D5FB: write_fmt<std::io::stdio::StdoutLock> (mod.rs:1679)
==5157==    by 0x15D5FB: <&std::io::stdio::Stdout as std::io::Write>::write_fmt (stdio.rs:715)
==5157==    by 0x15E133: write_fmt (stdio.rs:689)
==5157==    by 0x15E133: print_to<std::io::stdio::Stdout> (stdio.rs:1017)
==5157==    by 0x15E133: std::io::stdio::_print (stdio.rs:1030)
==5157==    by 0x10CDCB: scratch::main (main.rs:3)
==5157==    by 0x10CEA3: core::ops::function::FnOnce::call_once (function.rs:251)
==5157==    by 0x11B3AB: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:122)
==5157==    by 0x17925F: std::rt::lang_start::{{closure}} (rt.rs:166)
==5157==    by 0x15B38B: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:286)
==5157==    by 0x15B38B: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:464)
==5157==    by 0x15B38B: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:428)
==5157==    by 0x15B38B: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:137)
==5157==    by 0x15B38B: {closure#2} (rt.rs:148)
==5157==    by 0x15B38B: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:464)
==5157==    by 0x15B38B: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:428)
==5157==    by 0x15B38B: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:137)
==5157==    by 0x15B38B: std::rt::lang_start_internal (rt.rs:148)
==5157==    by 0x17922B: std::rt::lang_start (rt.rs:165)
==5157==    by 0x10CE07: main (in /builds/scratch/target/debug/scratch)
==5157==  Address 0x31 is not stack'd, malloc'd or (recently) free'd
==5157== 
==5157== 
==5157== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==5157==  Access not within mapped region at address 0x57
==5157==    at 0x112474: alternate (mod.rs:1893)
==5157==    by 0x112474: core::fmt::Formatter::pad_integral (mod.rs:1366)
==5157==    by 0x111BBB: core::fmt::num::fmt_u128 (num.rs:641)
==5157==    by 0x112347: core::fmt::write (mod.rs:1202)
==5157==    by 0x15D5FB: write_fmt<std::io::stdio::StdoutLock> (mod.rs:1679)
==5157==    by 0x15D5FB: <&std::io::stdio::Stdout as std::io::Write>::write_fmt (stdio.rs:715)
==5157==    by 0x15E133: write_fmt (stdio.rs:689)
==5157==    by 0x15E133: print_to<std::io::stdio::Stdout> (stdio.rs:1017)
==5157==    by 0x15E133: std::io::stdio::_print (stdio.rs:1030)
==5157==    by 0x10CDCB: scratch::main (main.rs:3)
==5157==    by 0x10CEA3: core::ops::function::FnOnce::call_once (function.rs:251)
==5157==    by 0x11B3AB: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:122)
==5157==    by 0x17925F: std::rt::lang_start::{{closure}} (rt.rs:166)
==5157==    by 0x15B38B: call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (function.rs:286)
==5157==    by 0x15B38B: do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panicking.rs:464)
==5157==    by 0x15B38B: try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> (panicking.rs:428)
==5157==    by 0x15B38B: catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> (panic.rs:137)
==5157==    by 0x15B38B: {closure#2} (rt.rs:148)
==5157==    by 0x15B38B: do_call<std::rt::lang_start_internal::{closure_env#2}, isize> (panicking.rs:464)
==5157==    by 0x15B38B: try<isize, std::rt::lang_start_internal::{closure_env#2}> (panicking.rs:428)
==5157==    by 0x15B38B: catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> (panic.rs:137)
==5157==    by 0x15B38B: std::rt::lang_start_internal (rt.rs:148)
==5157==    by 0x17922B: std::rt::lang_start (rt.rs:165)
==5157==    by 0x10CE07: main (in /builds/scratch/target/debug/scratch)
==5157==  If you believe this happened as a result of a stack
==5157==  overflow in your program's main thread (unlikely but
==5157==  possible), you can try to increase the size of the
==5157==  main thread stack using the --main-stacksize= flag.
==5157==  The main thread stack size used in this run was 10485760.
==5157== 
==5157== HEAP SUMMARY:
==5157==     in use at exit: 1,109 bytes in 4 blocks
==5157==   total heap usage: 9 allocs, 5 frees, 2,997 bytes allocated
==5157== 
==5157== LEAK SUMMARY:
==5157==    definitely lost: 0 bytes in 0 blocks
==5157==    indirectly lost: 0 bytes in 0 blocks
==5157==      possibly lost: 0 bytes in 0 blocks
==5157==    still reachable: 1,109 bytes in 4 blocks
==5157==         suppressed: 0 bytes in 0 blocks
==5157== Rerun with --leak-check=full to see details of leaked memory
==5157== 
==5157== For lists of detected and suppressed errors, rerun with: -s
==5157== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
<backtrace>

The text was updated successfully, but these errors were encountered:

Noratrieb · 2022-09-25T16:22:36Z

I can't reproduce this segfault inside an arm64 docker container on an x86_64 host, so this seems to require a real machine and doesn't work under QEMU.
Linux 8ae19ce495c5 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

saethlin · 2022-09-25T17:09:24Z

Thus far I cannot reproduce this issue. Perhaps because I'm on

Linux alarm 5.19.8-1-aarch64-ARCH #1 SMP PREEMPT Thu Sep 8 18:20:33 MDT 2022 aarch64 GNU/Linux

Is the above output from a graviton2 instance?

thomcc · 2022-09-25T17:09:32Z

Are you using mold as your linker by any chance? Seems somewhat similar to #101247.

prestontimmons · 2022-09-25T19:53:19Z

Thanks for looking into this.

I also have not been able to reproduce this on x86_64 or in an emulated docker running on x86_64.
Yes, it is a graviton2 instance using 5.10.135-122.509.amzn2.aarch64.
No, this is using the default linker. mold has not been added.

I did some more testing and found an interesting result. When using cargo run directly on the host the segmentation fault is not occurring, but I see it consistently in the docker runner that runs on the host (this is part of our CI). The docker image is based on rust:1.62-slim-bullseye.

I'll dig deeper and find a more specific setup that reproduces it.

Dirreke · 2023-09-19T13:27:21Z

I met a similar issue on csky-arch when using println!.

A small u128 will return the wrong result.

let a = 0_u128;
println!("{a}"); //14082568811966739713
let a = 1_u128;
println!("{a}"); //14082568811966739714
let a = 10_u128.pow(18);
println!("{a}"); //15082568811966739713
let a = 10_u128.pow(19);
println!("{a}"); //140825688119667397140000000421709631291

A large u128 will return the segmentation fault

let a = 2_u128.pow(84);
println!("{a}"); //segmentation fault

the calculation of u128 is correct and the other format type is correct

let a = 2_u128.pow(84) ;
println!("{:b}", a); //1000000000000000000000000000000000000000000000000000000000000000000000000000000000000
let a = (0_u128 + 1_u128 ) as u64;
println!("{:b}", a); //1

Actually, I'm working on migrating code to csky arch, which is a niche arch. I introduced the csky arch to rust by #113658 and introduced it to libc by rust-lang/libc#3301 .

I'm not sure what caused this issue. This issue is similar with yours. It confused me and I don't know if it is just my igorance in my PR or some error in any other code.

saethlin · 2023-09-19T17:22:29Z

Nobody ever came up with a reproducer of the original report. I just spun up a few graviton instances and tried to again, and I couldn't reproduce the originally-reported crash.

I'm sure we could help out if you can come up with a reproducer that doesn't require owning some niche hardware. Is there an emulator people can run?

Failing in that, I'd try reporting this problem to your local expert on your arch. I strongly suspect that whatever is going on here is not too Rust-specific. This is probably an LLVM or linker problem, so anyone who can reproduce the problem and is experienced with low-level debugging could really help us out here by identifying what has gone wrong with the codegen. If this happens without optimizations, it's probably fairly localized. For example, if someone can point out "The instructions look good up until this one, at which point it makes no sense. The executable should contain these instructions instead."

Dirreke · 2023-10-20T16:09:13Z

I met a similar issue on csky-arch when using println!.

A small u128 will return the wrong result.
let a = 0_u128;
println!("{a}"); //14082568811966739713
let a = 1_u128;
println!("{a}"); //14082568811966739714
let a = 10_u128.pow(18);
println!("{a}"); //15082568811966739713
let a = 10_u128.pow(19);
println!("{a}"); //140825688119667397140000000421709631291
A large u128 will return the segmentation fault
let a = 2_u128.pow(84);
println!("{a}"); //segmentation fault
the calculation of u128 is correct and the other format type is correct
let a = 2_u128.pow(84) ;
println!("{:b}", a); //1000000000000000000000000000000000000000000000000000000000000000000000000000000000000
let a = (0_u128 + 1_u128 ) as u64;
println!("{:b}", a); //1
Actually, I'm working on migrating code to csky arch, which is a niche arch. I introduced the csky arch to rust by #113658 and introduced it to libc by rust-lang/libc#3301 .

I'm not sure what caused this issue. This issue is similar with yours. It confused me and I don't know if it is just my igorance in my PR or some error in any other code.

Fixed it by llvm/llvm-project#69732 .

kpreid · 2023-12-25T17:01:19Z

Triage: Relabeling issues which don't have a runnable reproduction (as opposed to having a non-minimized one) to the new label S-needs-repro.
@rustbot label +S-needs-repro -E-needs-mcve

Noratrieb · 2024-11-09T21:19:46Z

@Dirreke what's the status here? Your linked LLVM PR that was supposed to fix this was closed with "this is an error in the codegen of rust", which implies that there is something that needs to be fixed here. Is this true?

Noratrieb · 2024-11-09T21:55:18Z

@saethlin kindly reminded me that this issue has nothing to do with CSKY, as it was found on AArch64. Therefore I'm closing this as it does not have a reproduction, people tried to get one and failed.

Dirreke · 2024-11-10T00:11:06Z

@Dirreke what's the status here? Your linked LLVM PR that was supposed to fix this was closed with "this is an error in the codegen of rust", which implies that there is something that needs to be fixed here. Is this true?

Thanks. The issue that was produced in csky has been fixed.

prestontimmons added the C-bug Category: This is a bug. label Sep 23, 2022

thomcc added the O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state label Sep 25, 2022

saethlin added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Sep 19, 2023

rustbot added S-needs-repro Status: This issue has no reproduction and needs a reproduction to make progress. and removed E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example labels Dec 25, 2023

workingjubilee added the O-csky Target: glaCSKY above covers over me~ label May 29, 2024

Noratrieb removed the O-csky Target: glaCSKY above covers over me~ label Nov 9, 2024

Noratrieb closed this as not planned Won't fix, can't repro, duplicate, stale Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when formatting u128 on aarch64 GNU/Linux #102196

Segmentation fault when formatting u128 on aarch64 GNU/Linux #102196

prestontimmons commented Sep 23, 2022

Noratrieb commented Sep 25, 2022

saethlin commented Sep 25, 2022

thomcc commented Sep 25, 2022

prestontimmons commented Sep 25, 2022

Dirreke commented Sep 19, 2023 •

edited

Loading

saethlin commented Sep 19, 2023 •

edited

Loading

Dirreke commented Oct 20, 2023

kpreid commented Dec 25, 2023

Noratrieb commented Nov 9, 2024

Noratrieb commented Nov 9, 2024

Dirreke commented Nov 10, 2024 •

edited

Loading

Segmentation fault when formatting u128 on aarch64 GNU/Linux #102196

Segmentation fault when formatting u128 on aarch64 GNU/Linux #102196

Comments

prestontimmons commented Sep 23, 2022

Meta

Noratrieb commented Sep 25, 2022

saethlin commented Sep 25, 2022

thomcc commented Sep 25, 2022

prestontimmons commented Sep 25, 2022

Dirreke commented Sep 19, 2023 • edited Loading

saethlin commented Sep 19, 2023 • edited Loading

Dirreke commented Oct 20, 2023

kpreid commented Dec 25, 2023

Noratrieb commented Nov 9, 2024

Noratrieb commented Nov 9, 2024

Dirreke commented Nov 10, 2024 • edited Loading

Dirreke commented Sep 19, 2023 •

edited

Loading

saethlin commented Sep 19, 2023 •

edited

Loading

Dirreke commented Nov 10, 2024 •

edited

Loading