Make `PauseTiming()` and `ResumeTiming()` per thread. #286

EricWF · 2016-09-02T09:35:37Z

Currently we time benchmarks using a single global timer that tracks per-process CPU usage. Pausing and resuming this timer have to act as a barrier to all threads. This has crippling effects on multi-threaded benchmarks. If you pause every iterator you synchronize the entire benchmark. It's effectively no longer multi-threaded.

This patch changes to a per-thread timer. Instead of measuring process CPU time we sum thread CPU time and we pause on a per-thread basis.

Below are comparison of the new and old results from basic_test.cc. Note that the BM_spin_pause_during test get 95% faster.

Benchmark                                                    Time           CPU
-------------------------------------------------------------------------------
BM_empty_mean                                               +0.00         +0.00
BM_empty/threads:4_mean                                     +0.00         +0.00
BM_spin_empty/8_mean                                        +0.00         +0.00
BM_spin_empty/512_mean                                      +0.01         +0.00
BM_spin_empty/8k_mean                                       -0.00         -0.00
BM_spin_empty/8/threads:4_mean                              +0.00         -0.17
BM_spin_empty/512/threads:4_mean                            +0.01         +0.00
BM_spin_empty/8k/threads:4_mean                             -0.01         +0.00
BM_spin_pause_before/8_mean                                 +0.00         +0.00
BM_spin_pause_before/512_mean                               +0.03         +0.02
BM_spin_pause_before/8k_mean                                +0.01         +0.01
BM_spin_pause_before/8/threads:4_mean                       +0.00         +0.00
BM_spin_pause_before/512/threads:4_mean                     +0.04         +0.01
BM_spin_pause_before/8k/threads:4_mean                      -0.03         -0.00
BM_spin_pause_during/8_mean                                 -0.24         -0.25
BM_spin_pause_during/512_mean                               -0.24         -0.24
BM_spin_pause_during/8k_mean                                -0.13         -0.13
BM_spin_pause_during/8/threads:4_mean                       -0.97         -0.90
BM_spin_pause_during/512/threads:4_mean                     -0.96         -0.89
BM_spin_pause_during/8k/threads:4_mean                      -0.95         -0.85
BM_pause_during_mean                                        -0.23         -0.20
BM_pause_during/threads:4_mean                              -0.97         -0.90
BM_pause_during/real_time_mean                              -0.24         -0.26
BM_pause_during/real_time/threads:4_mean                    -0.97         -0.90
BM_spin_pause_after/8_mean                                  +0.00         +0.00
BM_spin_pause_after/512_mean                                +0.00         +0.00
BM_spin_pause_after/8k_mean                                 -0.00         -0.00
BM_spin_pause_after/8/threads:4_mean                        +0.00         +0.00
BM_spin_pause_after/512/threads:4_mean                      -0.01         -0.02
BM_spin_pause_after/8k/threads:4_mean                       +0.01         +0.01
BM_spin_pause_before_and_after/8_mean                       +0.00         +0.00
BM_spin_pause_before_and_after/512_mean                     +0.00         +0.00
BM_spin_pause_before_and_after/8k_mean                      +0.01         +0.01
BM_spin_pause_before_and_after/8/threads:4_mean             +0.00         +0.00
BM_spin_pause_before_and_after/512/threads:4_mean           +0.06         +0.04
BM_spin_pause_before_and_after/8k/threads:4_mean            -0.00         +0.02
BM_empty_stop_start_mean                                    +0.00         +0.00
BM_empty_stop_start/threads:4_mean                          +0.00         +0.00

There's still work to do on this, but I was hoping for initial feedback on the direction.

AppVeyorBot · 2016-09-02T09:56:59Z

❌ Build benchmark 411 failed (commit 3668875662 by @EricWF)

AppVeyorBot · 2016-09-02T10:17:50Z

❌ Build benchmark 412 failed (commit 13b4a6c641 by @EricWF)

coveralls · 2016-09-02T10:24:50Z

Coverage decreased (-0.6%) to 87.059% when pulling 724ce26 on efcs:per-thread-timers into 6a28f1e on google:master.

AppVeyorBot · 2016-09-02T10:28:41Z

❌ Build benchmark 413 failed (commit b6f58fa5a2 by @EricWF)

dmah42 · 2016-09-02T15:30:33Z

src/benchmark.cc

@@ -861,19 +797,17 @@ RunBenchmark(const benchmark::internal::Benchmark::Instance& b,
            thread.join();
        }
        for (std::size_t ti = 0; ti < pool.size(); ++ti) {
-            pool[ti] = std::thread(&RunInThread, &b, iters, static_cast<int>(ti), &total);
+            pool[ti] = std::thread(&RunInThread, &b, iters,
+                                   static_cast<int>(ti + 1), &manager);


is this an existing bug or are you incrementing thread_id for some reason related to the PR?

with the explicit RunInThread 0 below, should this loop start with ti = 1?

I changed it so the main thread always runs as thread 0 for multi-threaded benchmarks. Previously we just slept the main thread.

dmah42 · 2016-09-02T15:34:01Z

the direction looks great. it's very clean and clearly more correct.

AppVeyorBot · 2016-09-02T21:40:42Z

❌ Build benchmark 414 failed (commit a6ab48a224 by @EricWF)

EricWF · 2016-09-02T21:42:26Z

Unfortunately this patch is starting to get big. I had to fix a bunch of TSAN race conditions, including existing ones in CHECK and VLOG.

However I have managed to remove all (?) of the global state used to run the benchmarks.

dmah42 · 2016-09-02T21:43:06Z

src/benchmark.cc


-// TODO(ericwf): support MallocCounter.
-//static benchmark::MallocCounter *benchmark_mc;
+    Mutex& getBenchmarkMutex() const


can you do a format/style pass at some point? GetBenchmarkMutex, etc.

I'll do it right now if you clarify what the style should be (Other than FunctionNameAllUpper).

i think that's all it is.. clang-format Google-style should take care of the rest (there's some extra indents around the place).

Can we check in a .clang-format file to make formatting easier?

coveralls · 2016-09-02T21:45:14Z

Coverage decreased (-0.7%) to 86.931% when pulling fb608df on efcs:per-thread-timers into 6a28f1e on google:master.

AppVeyorBot · 2016-09-02T21:46:35Z

❌ Build benchmark 415 failed (commit a0eac36099 by @EricWF)

coveralls · 2016-09-02T21:49:42Z

Coverage decreased (-0.7%) to 86.931% when pulling 48caee7 on efcs:per-thread-timers into 6a28f1e on google:master.

coveralls · 2016-09-02T21:59:59Z

Coverage decreased (-0.8%) to 86.826% when pulling 448c797 on efcs:per-thread-timers into 94c2a30 on google:master.

dmah42 · 2016-09-02T22:05:30Z

this lgtm. have you run a comparison for all the benchmark tests to see how the timing changes?

AppVeyorBot · 2016-09-02T22:07:41Z

❌ Build benchmark 417 failed (commit 6e0eb519be by @EricWF)

coveralls · 2016-09-02T22:09:14Z

Coverage decreased (-0.8%) to 86.826% when pulling 9130bea on efcs:per-thread-timers into 94c2a30 on google:master.

coveralls · 2016-09-02T22:15:53Z

Coverage decreased (-0.8%) to 86.826% when pulling d2cbeac on efcs:per-thread-timers into 94c2a30 on google:master.

AppVeyorBot · 2016-09-02T22:18:38Z

❌ Build benchmark 418 failed (commit 92aefcfe62 by @EricWF)

coveralls · 2016-09-02T22:22:55Z

Coverage decreased (-0.8%) to 86.826% when pulling ef4640b on efcs:per-thread-timers into 94c2a30 on google:master.

EricWF · 2016-09-02T22:29:00Z

this lgtm. have you run a comparison for all the benchmark tests to see how the timing changes?

I've done some. The results for basic_test.cc are at the top of this PR. I just need to test this a little more on OS X which uses a really weird implementation.

AppVeyorBot · 2016-09-02T22:29:37Z

❌ Build benchmark 419 failed (commit c3a8866d99 by @EricWF)

coveralls · 2016-09-02T22:31:28Z

Coverage decreased (-0.8%) to 86.826% when pulling 2a6e747 on efcs:per-thread-timers into 94c2a30 on google:master.

AppVeyorBot · 2016-09-02T22:40:16Z

❌ Build benchmark 420 failed (commit 7d3644d3cf by @EricWF)

AppVeyorBot · 2016-09-02T22:51:19Z

❌ Build benchmark 421 failed (commit 5867586ae3 by @EricWF)

AppVeyorBot · 2016-09-02T22:54:22Z

❌ Build benchmark 422 failed (commit 69e69bd07e by @EricWF)

coveralls · 2016-09-02T22:57:24Z

Coverage decreased (-0.8%) to 86.826% when pulling 3e683c5 on efcs:per-thread-timers into 94c2a30 on google:master.

coveralls · 2016-09-02T23:05:45Z

Coverage decreased (-0.5%) to 87.191% when pulling e8858c5 on efcs:per-thread-timers into 94c2a30 on google:master.

AppVeyorBot · 2016-09-02T23:12:47Z

❌ Build benchmark 423 failed (commit ebc1b81f5c by @EricWF)

coveralls · 2016-09-02T23:41:20Z

Coverage decreased (-0.5%) to 87.133% when pulling 25d5659 on efcs:per-thread-timers into 94c2a30 on google:master.

AppVeyorBot · 2016-09-02T23:48:07Z

❌ Build benchmark 424 failed (commit 809a62661c by @EricWF)

coveralls · 2016-09-03T02:53:39Z

Coverage decreased (-0.5%) to 87.125% when pulling 7bc016b on efcs:per-thread-timers into 94c2a30 on google:master.

AppVeyorBot · 2016-09-03T03:00:04Z

❌ Build benchmark 425 failed (commit 3b22deabfb by @EricWF)

EricWF · 2016-09-03T03:34:22Z

have you run a comparison for all the benchmark tests to see how the timing changes?

I have now. The results are very close to the same except for on threaded benchmarks. Surprisingly threaded benchmarks that don't use PauseTiming still benefit, sometimes greatly.

Previously all threads would have to complete the KeepRunning loop before the timer could be paused, this essentially made each benchmark as slow as its slowest thread. This added as much as 30% to the real time measurements. Now that has been fixed.

dmah42 · 2016-09-06T16:42:08Z

wonderful!

Change to using per-thread timers

1dac927

googlebot added the cla: yes label Sep 2, 2016

EricWF mentioned this pull request Sep 2, 2016

Add Benchmark::ThreadRange() version with increment instead of multiply #283

Merged

EricWF added 2 commits September 2, 2016 03:59

fix bad assertions

ffb7b33

fix copy paste error on windows

724ce26

dmah42 reviewed Sep 2, 2016
View reviewed changes

EricWF added 5 commits September 2, 2016 14:29

Fix thread safety annotations

7fea4a2

Make null-log thread safe

529dac6

remove remaining globals

9cd8a88

use chrono for walltime since it is thread safe

ebae4a7

consolidate timer functions

fb608df

dmah42 reviewed Sep 2, 2016
View reviewed changes

Add missing ctime include

48caee7

EricWF added 3 commits September 2, 2016 15:50

Rename to be consistent with Google style

c150ce4

Merge branch 'master' into per-thread-timers

1762f3e

Format patch using clang-format

448c797

cleanup -Wthread-safety configuration

9130bea

Don't trust _POSIX_FEATURE macros because OS X lies.

d2cbeac

Fix OS X thread timings

ef4640b

attempt to fix mingw build

2a6e747

Attempt to make mingw work again

2ed3c3c

Revert old mingw workaround

3e683c5

improve diagnostics

e8858c5

Drastically improve OS X measurements

25d5659

Use average real time instead of max

7bc016b

EricWF merged commit cba945e into google:master Sep 3, 2016

LebedevRI mentioned this pull request Aug 9, 2024

Ensure reported Time is walltime by removing spurious scaling by threads #1836

Merged

Make PauseTiming() and ResumeTiming() per thread. #286

Make PauseTiming() and ResumeTiming() per thread. #286

Conversation

EricWF commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

coveralls commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

dmah42 Sep 2, 2016 • edited Loading

Choose a reason for hiding this comment

EricWF Sep 2, 2016

Choose a reason for hiding this comment

dmah42 commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

EricWF commented Sep 2, 2016

dmah42 Sep 2, 2016

Choose a reason for hiding this comment

EricWF Sep 2, 2016

Choose a reason for hiding this comment

dmah42 Sep 2, 2016

Choose a reason for hiding this comment

EricWF Sep 2, 2016

Choose a reason for hiding this comment

dmah42 Sep 2, 2016

Choose a reason for hiding this comment

coveralls commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

coveralls commented Sep 2, 2016

coveralls commented Sep 2, 2016

dmah42 commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

coveralls commented Sep 2, 2016

coveralls commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

coveralls commented Sep 2, 2016

EricWF commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

coveralls commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

coveralls commented Sep 2, 2016

coveralls commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

coveralls commented Sep 2, 2016

AppVeyorBot commented Sep 2, 2016

coveralls commented Sep 3, 2016

AppVeyorBot commented Sep 3, 2016

EricWF commented Sep 3, 2016

dmah42 commented Sep 6, 2016

Make `PauseTiming()` and `ResumeTiming()` per thread. #286

Make `PauseTiming()` and `ResumeTiming()` per thread. #286

dmah42 Sep 2, 2016 •

edited

Loading