-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x.py bench
should consider warning/informing the user if they have a configuration that may produce slower/inaccurate benchmakrs
#93683
Comments
I would rather not try to produce diagnostics about arbitrary configuration deltas, since that feels pretty hard to me; we have a lot of options, and the level to which they are benign is hard to tell (and somewhat dependent on local environment). In general, it is also difficult to effectively surface warnings in x.py, because we typically produce a lot of output, and so each line individually is not too noticeable. It also seems like for benchmarking in particular, beyond just the options in config.toml, you'd presumably likely want to make sure the local machine is otherwise idle, etc. Maybe that's more obvious than config.toml changes, not sure. |
Continuing from @the8472's comment: #90414 (comment)
I'm not sure what that would help? I think there are two cases that are common for me to want (I cannot speak to others, but they may generalize):
Arguably, it would be better to always do the first, and compare separate runs. This kinda makes it a pain to regenerate benchmarks (maybe its fine, and you just would not include old benchmarks in the updates), but more importantly it causes the inaccurate results on bursty machines like some laptops (mine). Measuring everything together increases the likelihood that the numbers will be comparable with each-other by a lot, IME. That said, I think you're asking for neither, and are interested in faster iteration? That's reasonable too (esp. if you have a stable benchmarking machine), but IDK if it makes sense as a default, since people probably expect at least one of the above situations to be the case (or it it just me? maybe my expectations don't generalize). Anyway I think there are probably good reasons to have any possible build configuration when benchmarking, so even a warning is possibly too strong. I will note that if the message is too subtle, I'd have missed it -- I can't check but I suspect the build said something about OTOH maybe the configuration/x.py/whatever should be set up so that this does something slightly more reasonable? That wouldn't help me for the debug info case, but TBH that one is my fault, since I literally forgot I turned it on. (It also may not even matter, if I'm wrong about the spills or if LLVM handles it). |
Perhaps another set of options for benchmarking is useful? Cargo has
This matters, is pretty obvious, and you can address it with more runs, but if the codegen settings, you can have very high confidence that one impl is faster than another, when really things are just compiled in a way that optimizes a case that wouldn't get optimized in the real world. For example, in my case (which I don't want to lean on too hard, since my confidence that this is indeed what happened is at most 80%), the build settings seemed such that:
Presumably because one was a cross-crate call and one was not? Dunno! Anyway, if the answer is we shouldn't do this, fair enough. Hopefully perf will notice any issues anyway? Also, in my case, I had already kinda noticed that the numbers in std were less reliable than in a separate crate (which I used to write the code without having to constantly build std), but mostly chalked it up to "the stdlib is weird". |
My suggestion is aimed at achieving a middle-ground (perhaps even improving the pareto frontier) between benchmark noise and iteration speed. I can already disable turbo clocks and compile with 1CGU to reduce noise, but this is extremely tedious, especially when running stage1 benchmarks. |
I think adding per-profile options definitely seems like it's probably too much: x.py already has a lot of knobs that confuse folks, and if we start trying to expand to per-profile options. Plus, having x.py bench blow away your whole x.py build cache (or need to cache separately) feels subtly annoying too.
We don't run std benchmarks anywhere right now, so not unless the functions happen to be used (and somewhat hot) in the compiler. It's a long time desire to start running those benchmarks somewhere, but that just hasn't happened yet.
Found this comment from the PR now -- I think this may be a viable step. I'm not sure the extent to which it's a false hope -- if your function is core::str::Chars, that won't get monomorphized outside std (unless it's #[inline]), so you're still getting all the negatives of incremental mode I think. I'd not be opposed to a PR making this slight adjustment, but I think it's probably not very helpful in terms of practical improvement here. |
It might not help in this particular case, but I have had std benches involving generic code where non-incremental mode still made them less noisy. |
Should
x.py bench
emit a message if settings that are bad for performance are enabled? I think my configuration was confusing me quite a bit in #90414 -- specificallyrust.debug-info-std
(which IIUC impacts which variables get spilled to the stack) andrust.incremental
(which impacts codegen units). There are definitely others relevant here.Note that I do think there are reasons that you may want to have these on when running the benchmark, so it should be an unobtrusive warning or message. Alternatively, it could be configured in some way automatically, if not overridden, similar to
[proflie.bench]
in cargo.Discussion carried over from that PR.
The text was updated successfully, but these errors were encountered: