-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] #1833
Comments
This is not a bug in the general sense, it's working as intended, but indeed, i'm not quite sure if the intention is correct or not. |
this is very much intended, and yes we could go either way on it, and this is the way we've chosen. the Time is the perceived time the work takes; ie, by adding more threads you would expect (if the work is parallelisable) that it would take less time. if, however, your algorithm was locking between threads you may not see this and that would be surprising. changing this would be very disruptive so it's not something we can easily entertain, even if we wanted to. |
Hi @dmah42, Thanks for your quick answer. It's an honour to try to contribute to this great tool. I understand that this issue may include several changes downstream. However, I can't find it right the that a code, which is explicitly taking 1s to run in 1 thread (as per "A question for you: Would you agree that a code that just has a wait time of 1s should be reported that it took 1s to run, regardless on the number of threads?" Digging deeper into the code: The following function BenchmarkRunner::IterationResults BenchmarkRunner::DoNIterations() {
...
RunInThread(&b, iters, 0, manager.get(), perf_counters_measurement_ptr); And State st =
b->Run(iters, thread_id, &timer, manager, perf_counters_measurement);
BM_CHECK(st.skipped() || st.iterations() >= st.max_iterations)
<< "Benchmark returned before State::KeepRunning() returned false!";
{
MutexLock l(manager->GetBenchmarkMutex());
internal::ThreadManager::Result& results = manager->results;
results.iterations += st.iterations();
results.cpu_time_used += timer.cpu_time_used();
results.real_time_used += timer.real_time_used();
results.manual_time_used += timer.manual_time_used();
results.complexity_n += st.complexity_length_n();
internal::Increment(&results.counters, st.counters);
} But then further down in // Adjust real/manual time stats since they were reported per thread.
i.results.real_time_used /= b.threads();
i.results.manual_time_used /= b.threads();
// If we were measuring whole-process CPU usage, adjust the CPU time too.
if (b.measure_process_cpu_time()) i.results.cpu_time_used /= b.threads(); So, you're averaging the times with the number of threads, but the number of iterations doesn't get average causing the discrepancy in measurements that use the Since the averaging happens later with the total number of iterations, eliminating the time adjusting (all of Again, this would only be relevant for benchmarks that use the Please let me know what you think. Thanks, |
I've opened #1834 to add further information. |
Describe the bug
Multi-threaded benchmarks always report the time spent divided by the number of threads.
System
Which OS, compiler, and compiler version are you using:
To reproduce
Steps to reproduce the behaviour:
Use the following code:
Sample output:
Expected behavior
I would expect the time to always be like the one in
*threads:1
, because the visible time taken doesn't actually change.Additional context
Investigating the code base (and testing), I've found out that
BenchmarkReporter::Run::GetAdjustedRealTime()
doesn't consider the number of threads in its formula.The following change does the job.
BTW: Someone reported something similar at https://groups.google.com/g/benchmark-discuss/c/bWwa0QcZenc.
I've found this quite trivial, therefore I have my doubts about the lack of number of threads in the formula, so I preferred to raise this issue for someone more experienced with the original code.
The text was updated successfully, but these errors were encountered: