Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PauseTiming() and ResumeTiming() per thread. #286

Merged
merged 21 commits into from
Sep 3, 2016

Conversation

EricWF
Copy link
Contributor

@EricWF EricWF commented Sep 2, 2016

Currently we time benchmarks using a single global timer that tracks per-process CPU usage. Pausing and resuming this timer have to act as a barrier to all threads. This has crippling effects on multi-threaded benchmarks. If you pause every iterator you synchronize the entire benchmark. It's effectively no longer multi-threaded.

This patch changes to a per-thread timer. Instead of measuring process CPU time we sum thread CPU time and we pause on a per-thread basis.

Below are comparison of the new and old results from basic_test.cc. Note that the BM_spin_pause_during test get 95% faster.

Benchmark                                                    Time           CPU
-------------------------------------------------------------------------------
BM_empty_mean                                               +0.00         +0.00
BM_empty/threads:4_mean                                     +0.00         +0.00
BM_spin_empty/8_mean                                        +0.00         +0.00
BM_spin_empty/512_mean                                      +0.01         +0.00
BM_spin_empty/8k_mean                                       -0.00         -0.00
BM_spin_empty/8/threads:4_mean                              +0.00         -0.17
BM_spin_empty/512/threads:4_mean                            +0.01         +0.00
BM_spin_empty/8k/threads:4_mean                             -0.01         +0.00
BM_spin_pause_before/8_mean                                 +0.00         +0.00
BM_spin_pause_before/512_mean                               +0.03         +0.02
BM_spin_pause_before/8k_mean                                +0.01         +0.01
BM_spin_pause_before/8/threads:4_mean                       +0.00         +0.00
BM_spin_pause_before/512/threads:4_mean                     +0.04         +0.01
BM_spin_pause_before/8k/threads:4_mean                      -0.03         -0.00
BM_spin_pause_during/8_mean                                 -0.24         -0.25
BM_spin_pause_during/512_mean                               -0.24         -0.24
BM_spin_pause_during/8k_mean                                -0.13         -0.13
BM_spin_pause_during/8/threads:4_mean                       -0.97         -0.90
BM_spin_pause_during/512/threads:4_mean                     -0.96         -0.89
BM_spin_pause_during/8k/threads:4_mean                      -0.95         -0.85
BM_pause_during_mean                                        -0.23         -0.20
BM_pause_during/threads:4_mean                              -0.97         -0.90
BM_pause_during/real_time_mean                              -0.24         -0.26
BM_pause_during/real_time/threads:4_mean                    -0.97         -0.90
BM_spin_pause_after/8_mean                                  +0.00         +0.00
BM_spin_pause_after/512_mean                                +0.00         +0.00
BM_spin_pause_after/8k_mean                                 -0.00         -0.00
BM_spin_pause_after/8/threads:4_mean                        +0.00         +0.00
BM_spin_pause_after/512/threads:4_mean                      -0.01         -0.02
BM_spin_pause_after/8k/threads:4_mean                       +0.01         +0.01
BM_spin_pause_before_and_after/8_mean                       +0.00         +0.00
BM_spin_pause_before_and_after/512_mean                     +0.00         +0.00
BM_spin_pause_before_and_after/8k_mean                      +0.01         +0.01
BM_spin_pause_before_and_after/8/threads:4_mean             +0.00         +0.00
BM_spin_pause_before_and_after/512/threads:4_mean           +0.06         +0.04
BM_spin_pause_before_and_after/8k/threads:4_mean            -0.00         +0.02
BM_empty_stop_start_mean                                    +0.00         +0.00
BM_empty_stop_start/threads:4_mean                          +0.00         +0.00

There's still work to do on this, but I was hoping for initial feedback on the direction.

@AppVeyorBot
Copy link

Build benchmark 411 failed (commit 3668875662 by @EricWF)

@AppVeyorBot
Copy link

Build benchmark 412 failed (commit 13b4a6c641 by @EricWF)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.6%) to 87.059% when pulling 724ce26 on efcs:per-thread-timers into 6a28f1e on google:master.

@AppVeyorBot
Copy link

Build benchmark 413 failed (commit b6f58fa5a2 by @EricWF)

@@ -861,19 +797,17 @@ RunBenchmark(const benchmark::internal::Benchmark::Instance& b,
thread.join();
}
for (std::size_t ti = 0; ti < pool.size(); ++ti) {
pool[ti] = std::thread(&RunInThread, &b, iters, static_cast<int>(ti), &total);
pool[ti] = std::thread(&RunInThread, &b, iters,
static_cast<int>(ti + 1), &manager);
Copy link
Member

@dmah42 dmah42 Sep 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an existing bug or are you incrementing thread_id for some reason related to the PR?

with the explicit RunInThread 0 below, should this loop start with ti = 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it so the main thread always runs as thread 0 for multi-threaded benchmarks. Previously we just slept the main thread.

@dmah42
Copy link
Member

dmah42 commented Sep 2, 2016

the direction looks great. it's very clean and clearly more correct.

@AppVeyorBot
Copy link

Build benchmark 414 failed (commit a6ab48a224 by @EricWF)

@EricWF
Copy link
Contributor Author

EricWF commented Sep 2, 2016

Unfortunately this patch is starting to get big. I had to fix a bunch of TSAN race conditions, including existing ones in CHECK and VLOG.

However I have managed to remove all (?) of the global state used to run the benchmarks.


// TODO(ericwf): support MallocCounter.
//static benchmark::MallocCounter *benchmark_mc;
Mutex& getBenchmarkMutex() const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do a format/style pass at some point? GetBenchmarkMutex, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do it right now if you clarify what the style should be (Other than FunctionNameAllUpper).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that's all it is.. clang-format Google-style should take care of the rest (there's some extra indents around the place).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check in a .clang-format file to make formatting easier?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done :)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.7%) to 86.931% when pulling fb608df on efcs:per-thread-timers into 6a28f1e on google:master.

@AppVeyorBot
Copy link

Build benchmark 415 failed (commit a0eac36099 by @EricWF)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.7%) to 86.931% when pulling 48caee7 on efcs:per-thread-timers into 6a28f1e on google:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.8%) to 86.826% when pulling 448c797 on efcs:per-thread-timers into 94c2a30 on google:master.

@dmah42
Copy link
Member

dmah42 commented Sep 2, 2016

this lgtm. have you run a comparison for all the benchmark tests to see how the timing changes?

@AppVeyorBot
Copy link

Build benchmark 417 failed (commit 6e0eb519be by @EricWF)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.8%) to 86.826% when pulling 9130bea on efcs:per-thread-timers into 94c2a30 on google:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.8%) to 86.826% when pulling d2cbeac on efcs:per-thread-timers into 94c2a30 on google:master.

@AppVeyorBot
Copy link

Build benchmark 418 failed (commit 92aefcfe62 by @EricWF)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.8%) to 86.826% when pulling ef4640b on efcs:per-thread-timers into 94c2a30 on google:master.

@EricWF
Copy link
Contributor Author

EricWF commented Sep 2, 2016

this lgtm. have you run a comparison for all the benchmark tests to see how the timing changes?

I've done some. The results for basic_test.cc are at the top of this PR. I just need to test this a little more on OS X which uses a really weird implementation.

@AppVeyorBot
Copy link

Build benchmark 419 failed (commit c3a8866d99 by @EricWF)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.8%) to 86.826% when pulling 2a6e747 on efcs:per-thread-timers into 94c2a30 on google:master.

@AppVeyorBot
Copy link

Build benchmark 420 failed (commit 7d3644d3cf by @EricWF)

@AppVeyorBot
Copy link

Build benchmark 421 failed (commit 5867586ae3 by @EricWF)

@AppVeyorBot
Copy link

Build benchmark 422 failed (commit 69e69bd07e by @EricWF)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.8%) to 86.826% when pulling 3e683c5 on efcs:per-thread-timers into 94c2a30 on google:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 87.191% when pulling e8858c5 on efcs:per-thread-timers into 94c2a30 on google:master.

@AppVeyorBot
Copy link

Build benchmark 423 failed (commit ebc1b81f5c by @EricWF)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 87.133% when pulling 25d5659 on efcs:per-thread-timers into 94c2a30 on google:master.

@AppVeyorBot
Copy link

Build benchmark 424 failed (commit 809a62661c by @EricWF)

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.5%) to 87.125% when pulling 7bc016b on efcs:per-thread-timers into 94c2a30 on google:master.

@AppVeyorBot
Copy link

Build benchmark 425 failed (commit 3b22deabfb by @EricWF)

@EricWF
Copy link
Contributor Author

EricWF commented Sep 3, 2016

have you run a comparison for all the benchmark tests to see how the timing changes?

I have now. The results are very close to the same except for on threaded benchmarks. Surprisingly threaded benchmarks that don't use PauseTiming still benefit, sometimes greatly.

Previously all threads would have to complete the KeepRunning loop before the timer could be paused, this essentially made each benchmark as slow as its slowest thread. This added as much as 30% to the real time measurements. Now that has been fixed.

@EricWF EricWF merged commit cba945e into google:master Sep 3, 2016
@dmah42
Copy link
Member

dmah42 commented Sep 6, 2016

wonderful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants