Ensure reported Time is walltime by removing spurious scaling by threads #1836

dmah42 · 2024-08-07T14:11:43Z

See #1834 for detailed description of why this is useful.

LebedevRI · 2024-08-07T14:21:50Z

JSON - yes, please.
But i'm not particularly a fan of adding it to console reporter, it's far too bloated as is...

dmah42 · 2024-08-07T15:16:54Z

given the escalations we've had about this timing are focused on console reporter i think it's important to put it there.. hopefully noone is parsing it :D

src/json_reporter.cc

See #1834 for detailed description of why this is useful.

LebedevRI · 2024-08-07T16:36:59Z

Two alternatives:

Print it only when it is actually needed
Do the right thing, and print it instead of the existing time column :)

dmah42 · 2024-08-08T09:10:29Z

print it when needed: this is tricky with console because we print the header before we print the individual runs (unlike JSON).

do the right thing: is it the right thing though? also we would change existing benchmarks significantly which would break a lot of existing developer expectations.

LebedevRI · 2024-08-08T15:13:01Z

Honestly this is why i never pushed on this issue after encountering it years ago :/

print it when needed: this is tricky with console because we print the header before we print the individual runs (unlike JSON).

We already do something similar for tabular user counters it seems.

do the right thing: is it the right thing though? also we would change existing benchmarks significantly which would break a lot of existing developer expectations.

I understand that changing the actual underlying value would affect the results
(since the benchmark would now be run for N times more/less iterations),
and that just changing the output will, err, change the output,
but why can't the expectations change? It's just a scale change?
It's not like anyone parses that output :))

If anything, it makes the output consistent: "Time is the time passed on on the clock on your wall,
CPU is always the total amount of time that has been spent by all threads on all CPU cores".

dmah42 · 2024-08-08T16:20:59Z

Honestly this is why i never pushed on this issue after encountering it years ago :/

print it when needed: this is tricky with console because we print the header before we print the individual runs (unlike JSON).

We already do something similar for tabular user counters it seems.

hm true. i guess we could add a new header row if a run has a threads value > 1 and the previous value was not.

do the right thing: is it the right thing though? also we would change existing benchmarks significantly which would break a lot of existing developer expectations.

I understand that changing the actual underlying value would affect the results (since the benchmark would now be run for N times more/less iterations), and that just changing the output will, err, change the output, but why can't the expectations change? It's just a scale change? It's not like anyone parses that output :))

If anything, it makes the output consistent: "Time is the time passed on on the clock on your wall, CPU is always the total amount of time that has been spent by all threads on all CPU cores".

i think i agree. and i think way back this is actually how it worked. i'm going to bet that someone parses this output. https://www.hyrumslaw.com/

but maybe if we bump to 1.9 this is ok? wdyt.

LebedevRI · 2024-08-09T04:49:37Z

i think i agree. and i think way back this is actually how it worked.

I haven't checked the actual output, but as far as i can tell,
the explicit scaling was added in #286.
But maybe the scaling was implicit before that.

i'm going to bet that someone parses this output. https://www.hyrumslaw.com/

wink wink

but maybe if we bump to 1.9 this is ok? wdyt.

Does the human-oriented display output count as public API?
I'd think bump 1.8 -> 1.9 is enough for this though.

dmah42 · 2024-08-09T10:24:32Z

i'm so very tempted just to change the Time definition and release it as 1.9. i think i want at least 1 other person to agree explicitly that this is what we should do. i've also asked the discord for opinions.

dmah42 · 2024-08-09T14:28:40Z

i'm so very tempted just to change the Time definition and release it as 1.9. i think i want at least 1 other person to agree explicitly that this is what we should do. i've also asked the discord for opinions.

success (?) .. one other person on the discord agrees we should show the walltime as the Time and so this should be the new default.

@LebedevRI do you concur making it 3? :)

LebedevRI · 2024-08-09T14:46:18Z

Oh wait, i have to retroactively change my previous comments.
This really is about Wall Time, not CPU time? I hadn't realized that this time...
Then that rescaling makes even less sense.

One thing i hadn't though about yet: so what happens with complexity reports and kIsRate user counters?

dmah42 · 2024-08-09T14:50:07Z

i'll change the default instead of just the reporting and we'll see.

dmah42 · 2024-08-09T14:53:01Z

also "Process CPU time". i need to figure out if that should be scaled or not. i think probably not now?

LebedevRI · 2024-08-09T14:55:58Z

also "Process CPU time". i need to figure out if that should be scaled or not. i think probably not now?

AFAICT we only scale (divide by thread count) wall/manual times, not CPU time, no?

dmah42 · 2024-08-09T14:56:23Z

also "Process CPU time". i need to figure out if that should be scaled or not. i think probably not now?

AFAICT we only scale (divide by thread count) wall/manual times, not CPU time, no?

we scale CPU time (today) if it's "process cpu time".

dmah42 · 2024-08-09T15:33:18Z

it's really hard to say if it's better or worse. it does now show that threads don't affect walltime for benchmarks that don't actually parallelise whereas before it showed it being reduced.

ie in BM_CounterRates_Tabular, the benchmark multiplies two numbers. whether this is run using 1 thread or 1000, it should take the same amount of time. and now it does!

LebedevRI · 2024-08-09T15:43:46Z

Ah, so it looks like that CPU time rescaling was added in #763,
which had a lengthy disscussion on a similar topic.
I think, indeed, we should, at the very least, just drop all of this rescaling, regardless of the clock in question.

src/console_reporter.cc

LebedevRI

What could possibly go wrong?

dmah42 · 2024-08-09T16:22:27Z

dude, i'm so torn. the tests we have aren't real-world enough for me to draw any conclusions.

i tried writing something that actually uses the threads (doing the same amount of work but only doing a smaller part if there's more threads)

** old timing (main) **

----------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations
----------------------------------------------------------------------
BM_threaded/threads:1         10910273 ns     10910863 ns            1
BM_threaded/threads:2         25028050 ns     49863414 ns            2
BM_threaded/threads:4          9200662 ns     33926906 ns            4
BM_threaded/threads:8          6114852 ns     31858649 ns            8
BM_threaded/threads:16         3273450 ns     19625863 ns           16
BM_threaded/threads:32         1630770 ns     10109659 ns           32
BM_threaded/threads:64          761144 ns      4910742 ns           64
BM_threaded/threads:128         407445 ns      3065658 ns          128
BM_threaded/threads:256         123798 ns      1499209 ns          256
BM_threaded/threads:512          60811 ns       740970 ns          512
BM_threaded/threads:1024         11290 ns       267818 ns         1024
BM_threaded/threads:2048          1555 ns       124845 ns         2048
BM_threaded/threads:4096           194 ns        32155 ns         4096
BM_threaded/threads:8192          22.2 ns        17093 ns         8192
================================================================================

** new timing (branch) **

----------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations
----------------------------------------------------------------------
BM_threaded/threads:1         11579275 ns     11567352 ns            1
BM_threaded/threads:2         28931499 ns     28928913 ns            2
BM_threaded/threads:4         34379303 ns     31908936 ns            4
BM_threaded/threads:8         53041339 ns     41000491 ns            8
BM_threaded/threads:16        47934785 ns     16506729 ns           16
BM_threaded/threads:32        39456256 ns      5811854 ns           32
BM_threaded/threads:64        40633645 ns      3597184 ns           64
BM_threaded/threads:128       46344643 ns      2504077 ns          128
BM_threaded/threads:256       41461946 ns      1159753 ns          256
BM_threaded/threads:512       35033299 ns       612884 ns          512
BM_threaded/threads:1024      33181586 ns       340705 ns         1024
BM_threaded/threads:2048       4081943 ns       147625 ns         2048
BM_threaded/threads:4096        862596 ns        41268 ns         4096
BM_threaded/threads:8192        106611 ns        13279 ns         8192
================================================================================

the point at which the real time starts dropping (2048 threads) in the new timing tells me something about how my system can perform for this task. the old timing tells me nothing that the CPU timing didn't already tell me.

i think this convinces me that it's the right thing to do. @LebedevRI ?

LebedevRI · 2024-08-10T03:57:37Z

i think this convinces me that it's the right thing to do. @LebedevRI ?

As i've consistently said previously, i don't see how the current behavior could be the expected one,
so i still agree with myself on that :) That being said, this really should result in v1.9.0 not v1.8.6.
I suspect that any dissenting opinions on this change will arrive after this is merged,
unless it can be first merged into google first?

dmah42 · 2024-08-12T09:32:44Z

(running some tests internally)

dmah42 · 2024-08-13T17:11:29Z

we have a plan for how to roll this out. I will land it, release 1.9.0, send out some emails with warnings, and absorb complaints.

but I'm convinced this is a bug we should fix.

LebedevRI · 2024-08-13T17:13:30Z

Yay, so we are doing the right thing then :)

georgthegreat · 2024-08-21T16:37:35Z

@dmah42, could you, please, clarify the relation between this PR and (unaccepted one at #946)?

This is the only hunk that remains applicable atop of 1.9.0

diff --git a/src/benchmark_runner.cc b/src/benchmark_runner.cc
index 7bc6b6329ef4..35912b5cdc46 100644
--- a/src/benchmark_runner.cc
+++ b/src/benchmark_runner.cc

+ // Adjust time stats to average since they were reported by all threads.
+ i.seconds /= b.threads;

Don't you think this division should be carried out?

LebedevRI · 2024-08-21T17:05:00Z

That line would affect the number of iterations for which a benchmark will be run.
By making i.seconds b.threads times smaller, we'd require to run b.threads times more iterations.
IOW #946 would have moved such scaling from the doing that globally (i.e. both for reports,
and iteration count prediction), to doing that just for iteration count prediction,
whereas this PR just dropped all such scaling.

As [#1836](google/benchmark#1836) has landed into upstream, there is no need to keep [#946](google/benchmark#946) as a patch. d9d33ff20e1e7767759fc07ea97c1e661716f26a

dmah42 requested a review from LebedevRI August 7, 2024 14:11

LebedevRI reviewed Aug 7, 2024

View reviewed changes

src/json_reporter.cc Outdated Show resolved Hide resolved

Introduce per-thread times to console and json reporters

3053d1b

See #1834 for detailed description of why this is useful.

dmah42 force-pushed the thread_times branch from 254652f to 3053d1b Compare August 7, 2024 15:45

dmah42 added 2 commits August 7, 2024 16:47

clang-format

a1c840f

more clang-format

0d52a11

dmah42 marked this pull request as ready for review August 7, 2024 15:56

change the default to not scale

9a2d246

dmah42 added 2 commits August 9, 2024 16:33

remove unused statements

5fde8c0

clang-format

f834669

LebedevRI reviewed Aug 9, 2024

View reviewed changes

src/console_reporter.cc Outdated Show resolved Hide resolved

left a bit behind by mistake

4827765

LebedevRI approved these changes Aug 9, 2024

View reviewed changes

dmah42 changed the title ~~Introduce per-thread times to console and json reporters~~ Ensure reported Time is walltime by removing spurious scaling by threads Aug 12, 2024

dmah42 added this to the v1.9.0 milestone Aug 12, 2024

dmah42 merged commit a008bf8 into main Aug 13, 2024
98 checks passed

This was referenced Aug 13, 2024

Fix reported time for multithreaded benchmarks #946

Closed

Timer/counter adjustment by the ->Threads() could use a verification #769

Closed

[Q] Need help with understanding the results #1771

Closed

[BUG] Threads time taken report. #1834

Closed

LebedevRI mentioned this pull request Sep 8, 2024

[FR] PredictNumItersNeeded() 1.4 correction factor #1848

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure reported Time is walltime by removing spurious scaling by threads #1836

Ensure reported Time is walltime by removing spurious scaling by threads #1836

dmah42 commented Aug 7, 2024

LebedevRI commented Aug 7, 2024

dmah42 commented Aug 7, 2024

LebedevRI commented Aug 7, 2024

dmah42 commented Aug 8, 2024

LebedevRI commented Aug 8, 2024

dmah42 commented Aug 8, 2024

LebedevRI commented Aug 9, 2024

dmah42 commented Aug 9, 2024 •

edited

Loading

dmah42 commented Aug 9, 2024

LebedevRI commented Aug 9, 2024

dmah42 commented Aug 9, 2024

dmah42 commented Aug 9, 2024

LebedevRI commented Aug 9, 2024

dmah42 commented Aug 9, 2024

dmah42 commented Aug 9, 2024

LebedevRI commented Aug 9, 2024

LebedevRI left a comment

dmah42 commented Aug 9, 2024 •

edited

Loading

LebedevRI commented Aug 10, 2024

dmah42 commented Aug 12, 2024

dmah42 commented Aug 13, 2024

LebedevRI commented Aug 13, 2024

georgthegreat commented Aug 21, 2024

LebedevRI commented Aug 21, 2024

Ensure reported Time is walltime by removing spurious scaling by threads #1836

Ensure reported Time is walltime by removing spurious scaling by threads #1836

Conversation

dmah42 commented Aug 7, 2024

LebedevRI commented Aug 7, 2024

dmah42 commented Aug 7, 2024

LebedevRI commented Aug 7, 2024

dmah42 commented Aug 8, 2024

LebedevRI commented Aug 8, 2024

dmah42 commented Aug 8, 2024

LebedevRI commented Aug 9, 2024

dmah42 commented Aug 9, 2024 • edited Loading

dmah42 commented Aug 9, 2024

LebedevRI commented Aug 9, 2024

dmah42 commented Aug 9, 2024

dmah42 commented Aug 9, 2024

LebedevRI commented Aug 9, 2024

dmah42 commented Aug 9, 2024

dmah42 commented Aug 9, 2024

LebedevRI commented Aug 9, 2024

LebedevRI left a comment

Choose a reason for hiding this comment

dmah42 commented Aug 9, 2024 • edited Loading

LebedevRI commented Aug 10, 2024

dmah42 commented Aug 12, 2024

dmah42 commented Aug 13, 2024

LebedevRI commented Aug 13, 2024

georgthegreat commented Aug 21, 2024

LebedevRI commented Aug 21, 2024

dmah42 commented Aug 9, 2024 •

edited

Loading

dmah42 commented Aug 9, 2024 •

edited

Loading