[BUG] #1833

manuelgustavo · 2024-08-05T20:32:05Z

Describe the bug
Multi-threaded benchmarks always report the time spent divided by the number of threads.

System
Which OS, compiler, and compiler version are you using:

OS: Linux (Ubuntu 22.04)
Compiler and version: clang-14

To reproduce
Steps to reproduce the behaviour:

Use the following code:

static void simple_run(benchmark::State& state)
{
    for (auto _ : state)
    {
        std::this_thread::sleep_for(1s);
    }
}
BENCHMARK(simple_run)->ThreadRange(1, 8 << 7);
BENCHMARK_MAIN();

Sample output:

------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
simple_run/threads:1    1000144900 ns        39474 ns           10
simple_run/threads:2     500064675 ns        35772 ns           20
simple_run/threads:4     250033026 ns        33278 ns           40
simple_run/threads:8     125015169 ns        28046 ns           80
...

Expected behavior
I would expect the time to always be like the one in *threads:1, because the visible time taken doesn't actually change.

Additional context

Investigating the code base (and testing), I've found out that BenchmarkReporter::Run::GetAdjustedRealTime() doesn't consider the number of threads in its formula.

The following change does the job.

diff --git a/src/reporter.cc b/src/reporter.cc
index 076bc31..103d86e 100644
--- a/src/reporter.cc
+++ b/src/reporter.cc
@@ -105,7 +105,7 @@ std::string BenchmarkReporter::Run::benchmark_name() const {
 
 double BenchmarkReporter::Run::GetAdjustedRealTime() const {
   double new_time = real_accumulated_time * GetTimeUnitMultiplier(time_unit);
-  if (iterations != 0) new_time /= static_cast<double>(iterations);
+  if (iterations != 0) new_time /= (static_cast<double>(iterations) / static_cast<double>(threads));
   return new_time;
 }

BTW: Someone reported something similar at https://groups.google.com/g/benchmark-discuss/c/bWwa0QcZenc.

I've found this quite trivial, therefore I have my doubts about the lack of number of threads in the formula, so I preferred to raise this issue for someone more experienced with the original code.

The text was updated successfully, but these errors were encountered:

LebedevRI · 2024-08-05T21:18:34Z

This is not a bug in the general sense, it's working as intended, but indeed, i'm not quite sure if the intention is correct or not.

dmah42 · 2024-08-06T08:25:11Z

this is very much intended, and yes we could go either way on it, and this is the way we've chosen.

the Time is the perceived time the work takes; ie, by adding more threads you would expect (if the work is parallelisable) that it would take less time. if, however, your algorithm was locking between threads you may not see this and that would be surprising.

changing this would be very disruptive so it's not something we can easily entertain, even if we wanted to.

manuelgustavo · 2024-08-06T09:14:48Z

Hi @dmah42,

Thanks for your quick answer. It's an honour to try to contribute to this great tool.

I understand that this issue may include several changes downstream.

However, I can't find it right the that a code, which is explicitly taking 1s to run in 1 thread (as per sleep_for(1s)), gets reported as 0.5s in 2 threads, 0.25 in 4 threads and so on.

"A question for you: Would you agree that a code that just has a wait time of 1s should be reported that it took 1s to run, regardless on the number of threads?"

Digging deeper into the code:

The following function DoNIterations() will call RunInThread():

BenchmarkRunner::IterationResults BenchmarkRunner::DoNIterations() {
...
RunInThread(&b, iters, 0, manager.get(), perf_counters_measurement_ptr);

And RunInThread() will eventually post its results

State st =
      b->Run(iters, thread_id, &timer, manager, perf_counters_measurement);
  BM_CHECK(st.skipped() || st.iterations() >= st.max_iterations)
      << "Benchmark returned before State::KeepRunning() returned false!";
  {
    MutexLock l(manager->GetBenchmarkMutex());
    internal::ThreadManager::Result& results = manager->results;
    results.iterations += st.iterations();
    results.cpu_time_used += timer.cpu_time_used();
    results.real_time_used += timer.real_time_used();
    results.manual_time_used += timer.manual_time_used();
    results.complexity_n += st.complexity_length_n();
    internal::Increment(&results.counters, st.counters);
  }

But then further down in RunInThread you have:

  // Adjust real/manual time stats since they were reported per thread.
  i.results.real_time_used /= b.threads();
  i.results.manual_time_used /= b.threads();
  // If we were measuring whole-process CPU usage, adjust the CPU time too.
  if (b.measure_process_cpu_time()) i.results.cpu_time_used /= b.threads();

So, you're averaging the times with the number of threads, but the number of iterations doesn't get average causing the discrepancy in measurements that use the ->Threads([greater than 1]) functionality.

Since the averaging happens later with the total number of iterations, eliminating the time adjusting (all of =/ b.threads();) from the code would be the best solution. Also, this could be an optional parameter (for initial testing purposes).

Again, this would only be relevant for benchmarks that use the ->Threads function.

Please let me know what you think.

Thanks,
Manuel

manuelgustavo · 2024-08-06T10:03:40Z

I've opened #1834 to add further information.

dmah42 closed this as completed Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] #1833

[BUG] #1833

manuelgustavo commented Aug 5, 2024

LebedevRI commented Aug 5, 2024

dmah42 commented Aug 6, 2024

manuelgustavo commented Aug 6, 2024

manuelgustavo commented Aug 6, 2024

[BUG] #1833

[BUG] #1833

Comments

manuelgustavo commented Aug 5, 2024

LebedevRI commented Aug 5, 2024

dmah42 commented Aug 6, 2024

manuelgustavo commented Aug 6, 2024

manuelgustavo commented Aug 6, 2024