`x.py bench` should consider warning/informing the user if they have a configuration that may produce slower/inaccurate benchmakrs #93683

thomcc · 2022-02-05T21:41:13Z

Should x.py bench emit a message if settings that are bad for performance are enabled? I think my configuration was confusing me quite a bit in #90414 -- specifically rust.debug-info-std (which IIUC impacts which variables get spilled to the stack) and rust.incremental (which impacts codegen units). There are definitely others relevant here.

Note that I do think there are reasons that you may want to have these on when running the benchmark, so it should be an unobtrusive warning or message. Alternatively, it could be configured in some way automatically, if not overridden, similar to [proflie.bench] in cargo.

Discussion carried over from that PR.

The text was updated successfully, but these errors were encountered:

Mark-Simulacrum · 2022-02-05T21:49:06Z

I would rather not try to produce diagnostics about arbitrary configuration deltas, since that feels pretty hard to me; we have a lot of options, and the level to which they are benign is hard to tell (and somewhat dependent on local environment). In general, it is also difficult to effectively surface warnings in x.py, because we typically produce a lot of output, and so each line individually is not too noticeable.

It also seems like for benchmarking in particular, beyond just the options in config.toml, you'd presumably likely want to make sure the local machine is otherwise idle, etc. Maybe that's more obvious than config.toml changes, not sure.

thomcc · 2022-02-05T21:50:55Z

Continuing from @the8472's comment: #90414 (comment)

I wonder if we could build std and dependencies with incremental but the final benchmark crate in non-incremental mode.

I'm not sure what that would help? I think there are two cases that are common for me to want (I cannot speak to others, but they may generalize):

Compile and link std and the benchmark them as realistically as possible with respect to usage of std in code in the wild built with --release.
Compile both together, and optimize them with full LTOed, so that I can compare with the algorithm in std versus a comparison impl in the benchmark, without additional overhead caused by which one is where.

Arguably, it would be better to always do the first, and compare separate runs. This kinda makes it a pain to regenerate benchmarks (maybe its fine, and you just would not include old benchmarks in the updates), but more importantly it causes the inaccurate results on bursty machines like some laptops (mine). Measuring everything together increases the likelihood that the numbers will be comparable with each-other by a lot, IME.

That said, I think you're asking for neither, and are interested in faster iteration? That's reasonable too (esp. if you have a stable benchmarking machine), but IDK if it makes sense as a default, since people probably expect at least one of the above situations to be the case (or it it just me? maybe my expectations don't generalize).

Anyway I think there are probably good reasons to have any possible build configuration when benchmarking, so even a warning is possibly too strong.

I will note that if the message is too subtle, I'd have missed it -- I can't check but I suspect the build said something about [optimized+debug] somewhere in the information output.

OTOH maybe the configuration/x.py/whatever should be set up so that this does something slightly more reasonable? That wouldn't help me for the debug info case, but TBH that one is my fault, since I literally forgot I turned it on. (It also may not even matter, if I'm wrong about the spills or if LLVM handles it).

thomcc · 2022-02-05T22:10:47Z

It also seems like for benchmarking in particular, beyond just the options in config.toml, you'd presumably likely want to make sure the local machine is otherwise idle, etc. Maybe that's more obvious than config.toml changes, not sure.

Perhaps another set of options for benchmarking is useful? Cargo has [profile.bench], for example, and largely my handwavey notion of "it should do right thing" probably looks a similar "x.py bench should be closer to [profile.bench] unless you override it".

you'd presumably likely want to make sure the local machine is otherwise idle

This matters, is pretty obvious, and you can address it with more runs, but if the codegen settings, you can have very high confidence that one impl is faster than another, when really things are just compiled in a way that optimizes a case that wouldn't get optimized in the real world.

For example, in my case (which I don't want to lean on too hard, since my confidence that this is indeed what happened is at most 80%), the build settings seemed such that:

Code was consistently slower if it was implemented in std and called from the benchmarks
And it was faster if it was both implemented and called from the benchmarks

Presumably because one was a cross-crate call and one was not? Dunno!

Anyway, if the answer is we shouldn't do this, fair enough. Hopefully perf will notice any issues anyway?

Also, in my case, I had already kinda noticed that the numbers in std were less reliable than in a separate crate (which I used to write the code without having to constantly build std), but mostly chalked it up to "the stdlib is weird".

the8472 · 2022-02-05T22:12:09Z

My suggestion is aimed at achieving a middle-ground (perhaps even improving the pareto frontier) between benchmark noise and iteration speed. I can already disable turbo clocks and compile with 1CGU to reduce noise, but this is extremely tedious, especially when running stage1 benchmarks.

Mark-Simulacrum · 2022-02-06T02:00:12Z

I think adding per-profile options definitely seems like it's probably too much: x.py already has a lot of knobs that confuse folks, and if we start trying to expand to per-profile options. Plus, having x.py bench blow away your whole x.py build cache (or need to cache separately) feels subtly annoying too.

Hopefully perf will notice any issues anyway?

We don't run std benchmarks anywhere right now, so not unless the functions happen to be used (and somewhat hot) in the compiler. It's a long time desire to start running those benchmarks somewhere, but that just hasn't happened yet.

I wonder if we could build std and dependencies with incremental but the final benchmark crate in non-incremental mode.

Found this comment from the PR now -- I think this may be a viable step. I'm not sure the extent to which it's a false hope -- if your function is core::str::Chars, that won't get monomorphized outside std (unless it's #[inline]), so you're still getting all the negatives of incremental mode I think. I'd not be opposed to a PR making this slight adjustment, but I think it's probably not very helpful in terms of practical improvement here.

the8472 · 2022-02-06T08:22:09Z

It might not help in this particular case, but I have had std benches involving generic code where non-incremental mode still made them less noisy.

jyn514 added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) A-contributor-roadblock Area: Makes things more difficult for new contributors to rust itself labels Apr 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`x.py bench` should consider warning/informing the user if they have a configuration that may produce slower/inaccurate benchmakrs #93683

`x.py bench` should consider warning/informing the user if they have a configuration that may produce slower/inaccurate benchmakrs #93683

thomcc commented Feb 5, 2022

Mark-Simulacrum commented Feb 5, 2022

thomcc commented Feb 5, 2022

thomcc commented Feb 5, 2022

the8472 commented Feb 5, 2022

Mark-Simulacrum commented Feb 6, 2022

the8472 commented Feb 6, 2022

x.py bench should consider warning/informing the user if they have a configuration that may produce slower/inaccurate benchmakrs #93683

x.py bench should consider warning/informing the user if they have a configuration that may produce slower/inaccurate benchmakrs #93683

Comments

thomcc commented Feb 5, 2022

Mark-Simulacrum commented Feb 5, 2022

thomcc commented Feb 5, 2022

thomcc commented Feb 5, 2022

the8472 commented Feb 5, 2022

Mark-Simulacrum commented Feb 6, 2022

the8472 commented Feb 6, 2022

`x.py bench` should consider warning/informing the user if they have a configuration that may produce slower/inaccurate benchmakrs #93683

`x.py bench` should consider warning/informing the user if they have a configuration that may produce slower/inaccurate benchmakrs #93683