Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degrades if stdlib function uses io::Error::new() with &'static str #82812

Closed
Frago9876543210 opened this issue Mar 5, 2021 · 6 comments · Fixed by #83555
Closed
Labels
C-bug Category: This is a bug.

Comments

@Frago9876543210
Copy link

Frago9876543210 commented Mar 5, 2021

I am facing a problem while reading different kinds of binary formats. If you are going to read primitives like i32 too often, then performance degrades. This is mainly due to the fact that Error::new(kind, &'static str) returns box. Also it does boxing TWICE!

//stdlib code
enum Repr {
    Os(i32),
    Simple(ErrorKind),
    Custom(Box<Custom>),
}

#[derive(Debug)]
struct Custom {
    kind: ErrorKind,
    error: Box<dyn error::Error + Send + Sync>,
}

doc link: https://doc.rust-lang.org/std/io/struct.Error.html#method.new

I wrote simple bench to show pref problem.

cargo new --vcs=none io-perf
cd io-perf
mkdir benches
cat /dev/null > benches/read.rs
# now you have to modify this file e.g. `subl benches/read.rs`

read.rs:

#![feature(test)]

extern crate test;

use std::hint::black_box;
use std::io::{ErrorKind, Read, Result};
use test::Bencher;

pub trait CustomRead {
    fn read_exact_fast(&mut self, buf: &mut [u8]) -> Result<()>;
}

impl CustomRead for &[u8] {
    /// Clone of `&[u8]::read_exact` without boxing `&'static str`
    #[inline]
    fn read_exact_fast(&mut self, buf: &mut [u8]) -> Result<()> {
        if buf.len() > self.len() {
            // [!] no message here:
            return Err(ErrorKind::UnexpectedEof.into());
        }
        let (a, b) = self.split_at(buf.len());

        if buf.len() == 1 {
            buf[0] = a[0];
        } else {
            buf.copy_from_slice(a);
        }

        *self = b;
        Ok(())
    }
}

impl<R: CustomRead + ?Sized> CustomRead for &mut R {
    #[inline]
    fn read_exact_fast(&mut self, buf: &mut [u8]) -> Result<()> {
        (**self).read_exact_fast(buf)
    }
}

const READ_SIZE: usize = 4; //e.g. reading int x1000 times
const LOOPS: usize = 1000;

#[bench]
fn bench_read_exact_std_err(b: &mut Bencher) {
    b.iter(|| {
        let mut buf = [0u8; READ_SIZE];
        let mut r: &[u8] = &[]; //always would be error

        for _ in 0..LOOPS {
            black_box(Read::read_exact(&mut &*r, &mut buf).is_err());
        }
    });
}

#[bench]
fn bench_read_exact_fast_err(b: &mut Bencher) {
    b.iter(|| {
        let mut buf = [0u8; READ_SIZE];
        let mut r: &[u8] = &[]; //always would be error

        for _ in 0..LOOPS {
            black_box(CustomRead::read_exact_fast(&mut &*r, &mut buf).is_err());
        }
    });
}

#[bench]
fn bench_read_exact_std_ok(b: &mut Bencher) {
    b.iter(|| {
        let mut buf = [0u8; READ_SIZE];
        let mut r: &[u8] = &[0u8; READ_SIZE]; //always would be ok

        for _ in 0..LOOPS {
            black_box(Read::read_exact(&mut &*r, &mut buf).is_ok());
        }
    });
}

#[bench]
fn bench_read_exact_fast_ok(b: &mut Bencher) {
    b.iter(|| {
        let mut buf = [0u8; READ_SIZE];
        let mut r: &[u8] = &[0u8; READ_SIZE]; //always would be ok

        for _ in 0..LOOPS {
            black_box(CustomRead::read_exact_fast(&mut &*r, &mut buf).is_ok());
        }
    });
}

Results

Use cargo bench to get it

running 4 tests
test bench_read_exact_fast_err ... bench:         255 ns/iter (+/- 239)
test bench_read_exact_fast_ok  ... bench:         256 ns/iter (+/- 9)
test bench_read_exact_std_err  ... bench:     164,789 ns/iter (+/- 11,572)
test bench_read_exact_std_ok   ... bench:         256 ns/iter (+/- 11)

We can clearly see that performance is degrading. The more calls it takes to read, the worse the results will be.

Affected crates and stdlib itself

The first very popular crate that comes to mind is byteorder.
https://github.com/BurntSushi/byteorder/blob/5d9d0386488d0763954f14a5cb02b1c0afccbdcb/src/io.rs#L84-L89

It mainly depends on the data format. But what if you need to receive data over the network? The attacker may abuse wrong data to make ton of pointless memory allocations.

stdlib is also use Error::new() often enough. Let's run a simple bash to make sure of this

cd ~/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src
grep -r --include="*.rs" -E "ErrorKind::\w+," -A 2 -B 2

In many cases, you will see that it is used &'static str. stdlib sometimes create io::Error with format!() macro, however it doesn't happen very often.

rust/library/std/src/sys/unix/os.rs-            if path_len <= 1 {
rust/library/std/src/sys/unix/os.rs-                return Err(io::Error::new(
rust/library/std/src/sys/unix/os.rs:                    io::ErrorKind::Other,
rust/library/std/src/sys/unix/os.rs-                    "KERN_PROC_PATHNAME sysctl returned zero-length string",
rust/library/std/src/sys/unix/os.rs-                ));
--
rust/library/std/src/sys/unix/os.rs-        }
rust/library/std/src/sys/unix/os.rs-        Err(io::Error::new(
rust/library/std/src/sys/unix/os.rs:            io::ErrorKind::Other,
rust/library/std/src/sys/unix/os.rs-            "/proc/curproc/exe doesn't point to regular file.",
rust/library/std/src/sys/unix/os.rs-        ))

Possible solutions

  • Consider adding a new static method such as io::ErrorKind::with_message(_: &'static str) and new internal Repr member of that's enum.
  • How about using something like Cow, but with static lifetime?

Meta

rustc --version --verbose:

rustc 1.52.0-nightly (94736c434 2021-02-27)
binary: rustc
commit-hash: 94736c434ee154b30e2ec22ec112b79e3f6c5884
commit-date: 2021-02-27
host: x86_64-unknown-linux-gnu
release: 1.52.0-nightly
LLVM version: 11.0.1

Update 07.03.2021: Benchmarks have been improved as per the comment. This should show the problem better.

@Frago9876543210 Frago9876543210 added the C-bug Category: This is a bug. label Mar 5, 2021
@SkiFire13
Copy link
Contributor

fn bench_read_exact_std_ok(b: &mut Bencher) {
    b.iter(|| {
        let mut buf = [0u8; READ_SIZE];
        let mut r: &[u8] = &[0u8; READ_SIZE]; //always would be ok

        for _ in 0..LOOPS {
            black_box(Read::read_exact(&mut r, &mut buf).is_ok());
        }
    });
}

Note that the implementation of Read for &[u8] will mutate the slice reference to point to the remainder, so here the read_exact will succeed only once because after that r will be empty. This means that you're actually testing the error path in both "ok" and "err" benches. To actually measure the happy path you'll need for example to reborrow r (i.e. call Read::read_exact(&mut &*r, &mut buf). Applying this change to every bench function leads to the following results on my computer:

test bench_read_exact_fast_err ... bench:         237 ns/iter (+/- 5)
test bench_read_exact_fast_ok  ... bench:         234 ns/iter (+/- 2)
test bench_read_exact_std_err  ... bench:     126,251 ns/iter (+/- 2,466)
test bench_read_exact_std_ok   ... bench:         233 ns/iter (+/- 5)

The "std_err" bench is still pretty slow though

@Frago9876543210
Copy link
Author

@SkiFire13 Yes, I probably made a mistake while copying the code for the benchmark of bench_read_exact_fast_ok. It should be .is_ok(). However, this shows that io::Error::new() not zero-cost.

@m-ou-se
Copy link
Member

m-ou-se commented Mar 27, 2021

This is fixed by #83353

@m-ou-se
Copy link
Member

m-ou-se commented Mar 27, 2021

TWICE!

Turns out it was three times, if you also count the String that the &str is copied into.

@m-ou-se
Copy link
Member

m-ou-se commented Mar 27, 2021

Running your benchmark (thanks!):

Before:

test bench_read_exact_fast_err ... bench:         256 ns/iter (+/- 6)
test bench_read_exact_fast_ok  ... bench:         257 ns/iter (+/- 7)
test bench_read_exact_std_err  ... bench:      34,733 ns/iter (+/- 2,409)
test bench_read_exact_std_ok   ... bench:         256 ns/iter (+/- 8)

After:

test bench_read_exact_fast_err ... bench:         256 ns/iter (+/- 17)
test bench_read_exact_fast_ok  ... bench:         256 ns/iter (+/- 7)
test bench_read_exact_std_err  ... bench:       1,508 ns/iter (+/- 39)
test bench_read_exact_std_ok   ... bench:         256 ns/iter (+/- 7)

Looks like it isn't completely solved yet. Adding #[inline] to io::Error::new_const now... :)

@m-ou-se
Copy link
Member

m-ou-se commented Mar 27, 2021

That fixed it.

test bench_read_exact_fast_err ... bench:         258 ns/iter (+/- 10)
test bench_read_exact_fast_ok  ... bench:         256 ns/iter (+/- 9)
test bench_read_exact_std_err  ... bench:         257 ns/iter (+/- 6)
test bench_read_exact_std_ok   ... bench:         256 ns/iter (+/- 5)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants