Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure reported Time is walltime by removing spurious scaling by threads #1836

Merged
merged 7 commits into from
Aug 13, 2024

Conversation

dmah42
Copy link
Member

@dmah42 dmah42 commented Aug 7, 2024

See #1834 for detailed description of why this is useful.

@dmah42 dmah42 requested a review from LebedevRI August 7, 2024 14:11
@LebedevRI
Copy link
Collaborator

JSON - yes, please.
But i'm not particularly a fan of adding it to console reporter, it's far too bloated as is...

@dmah42
Copy link
Member Author

dmah42 commented Aug 7, 2024

given the escalations we've had about this timing are focused on console reporter i think it's important to put it there.. hopefully noone is parsing it :D

src/json_reporter.cc Outdated Show resolved Hide resolved
See #1834 for detailed description of why this is useful.
@dmah42 dmah42 marked this pull request as ready for review August 7, 2024 15:56
@LebedevRI
Copy link
Collaborator

Two alternatives:

  • Print it only when it is actually needed
  • Do the right thing, and print it instead of the existing time column :)

@dmah42
Copy link
Member Author

dmah42 commented Aug 8, 2024

print it when needed: this is tricky with console because we print the header before we print the individual runs (unlike JSON).

do the right thing: is it the right thing though? also we would change existing benchmarks significantly which would break a lot of existing developer expectations.

@LebedevRI
Copy link
Collaborator

Honestly this is why i never pushed on this issue after encountering it years ago :/

print it when needed: this is tricky with console because we print the header before we print the individual runs (unlike JSON).

We already do something similar for tabular user counters it seems.

do the right thing: is it the right thing though? also we would change existing benchmarks significantly which would break a lot of existing developer expectations.

I understand that changing the actual underlying value would affect the results
(since the benchmark would now be run for N times more/less iterations),
and that just changing the output will, err, change the output,
but why can't the expectations change? It's just a scale change?
It's not like anyone parses that output :))

If anything, it makes the output consistent: "Time is the time passed on on the clock on your wall,
CPU is always the total amount of time that has been spent by all threads on all CPU cores".

@dmah42
Copy link
Member Author

dmah42 commented Aug 8, 2024

Honestly this is why i never pushed on this issue after encountering it years ago :/

print it when needed: this is tricky with console because we print the header before we print the individual runs (unlike JSON).

We already do something similar for tabular user counters it seems.

hm true. i guess we could add a new header row if a run has a threads value > 1 and the previous value was not.

do the right thing: is it the right thing though? also we would change existing benchmarks significantly which would break a lot of existing developer expectations.

I understand that changing the actual underlying value would affect the results (since the benchmark would now be run for N times more/less iterations), and that just changing the output will, err, change the output, but why can't the expectations change? It's just a scale change? It's not like anyone parses that output :))

If anything, it makes the output consistent: "Time is the time passed on on the clock on your wall, CPU is always the total amount of time that has been spent by all threads on all CPU cores".

i think i agree. and i think way back this is actually how it worked. i'm going to bet that someone parses this output. https://www.hyrumslaw.com/

but maybe if we bump to 1.9 this is ok? wdyt.

@LebedevRI
Copy link
Collaborator

i think i agree. and i think way back this is actually how it worked.

I haven't checked the actual output, but as far as i can tell,
the explicit scaling was added in #286.
But maybe the scaling was implicit before that.

i'm going to bet that someone parses this output. https://www.hyrumslaw.com/

wink wink

but maybe if we bump to 1.9 this is ok? wdyt.

Does the human-oriented display output count as public API?
I'd think bump 1.8 -> 1.9 is enough for this though.

@dmah42
Copy link
Member Author

dmah42 commented Aug 9, 2024

i'm so very tempted just to change the Time definition and release it as 1.9. i think i want at least 1 other person to agree explicitly that this is what we should do. i've also asked the discord for opinions.

@dmah42
Copy link
Member Author

dmah42 commented Aug 9, 2024

i'm so very tempted just to change the Time definition and release it as 1.9. i think i want at least 1 other person to agree explicitly that this is what we should do. i've also asked the discord for opinions.

success (?) .. one other person on the discord agrees we should show the walltime as the Time and so this should be the new default.

@LebedevRI do you concur making it 3? :)

@LebedevRI
Copy link
Collaborator

Oh wait, i have to retroactively change my previous comments.
This really is about Wall Time, not CPU time? I hadn't realized that this time...
Then that rescaling makes even less sense.

One thing i hadn't though about yet: so what happens with complexity reports and kIsRate user counters?

@dmah42
Copy link
Member Author

dmah42 commented Aug 9, 2024

i'll change the default instead of just the reporting and we'll see.

@dmah42
Copy link
Member Author

dmah42 commented Aug 9, 2024

also "Process CPU time". i need to figure out if that should be scaled or not. i think probably not now?

@LebedevRI
Copy link
Collaborator

also "Process CPU time". i need to figure out if that should be scaled or not. i think probably not now?

AFAICT we only scale (divide by thread count) wall/manual times, not CPU time, no?

@dmah42
Copy link
Member Author

dmah42 commented Aug 9, 2024

also "Process CPU time". i need to figure out if that should be scaled or not. i think probably not now?

AFAICT we only scale (divide by thread count) wall/manual times, not CPU time, no?

we scale CPU time (today) if it's "process cpu time".

@dmah42
Copy link
Member Author

dmah42 commented Aug 9, 2024

it's really hard to say if it's better or worse. it does now show that threads don't affect walltime for benchmarks that don't actually parallelise whereas before it showed it being reduced.

ie in BM_CounterRates_Tabular, the benchmark multiplies two numbers. whether this is run using 1 thread or 1000, it should take the same amount of time. and now it does!

@LebedevRI
Copy link
Collaborator

Ah, so it looks like that CPU time rescaling was added in #763,
which had a lengthy disscussion on a similar topic.
I think, indeed, we should, at the very least, just drop all of this rescaling, regardless of the clock in question.

src/console_reporter.cc Outdated Show resolved Hide resolved
Copy link
Collaborator

@LebedevRI LebedevRI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could possibly go wrong?

@dmah42
Copy link
Member Author

dmah42 commented Aug 9, 2024

dude, i'm so torn. the tests we have aren't real-world enough for me to draw any conclusions.

i tried writing something that actually uses the threads (doing the same amount of work but only doing a smaller part if there's more threads)

** old timing (main) **

----------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations
----------------------------------------------------------------------
BM_threaded/threads:1         10910273 ns     10910863 ns            1
BM_threaded/threads:2         25028050 ns     49863414 ns            2
BM_threaded/threads:4          9200662 ns     33926906 ns            4
BM_threaded/threads:8          6114852 ns     31858649 ns            8
BM_threaded/threads:16         3273450 ns     19625863 ns           16
BM_threaded/threads:32         1630770 ns     10109659 ns           32
BM_threaded/threads:64          761144 ns      4910742 ns           64
BM_threaded/threads:128         407445 ns      3065658 ns          128
BM_threaded/threads:256         123798 ns      1499209 ns          256
BM_threaded/threads:512          60811 ns       740970 ns          512
BM_threaded/threads:1024         11290 ns       267818 ns         1024
BM_threaded/threads:2048          1555 ns       124845 ns         2048
BM_threaded/threads:4096           194 ns        32155 ns         4096
BM_threaded/threads:8192          22.2 ns        17093 ns         8192
================================================================================

** new timing (branch) **

----------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations
----------------------------------------------------------------------
BM_threaded/threads:1         11579275 ns     11567352 ns            1
BM_threaded/threads:2         28931499 ns     28928913 ns            2
BM_threaded/threads:4         34379303 ns     31908936 ns            4
BM_threaded/threads:8         53041339 ns     41000491 ns            8
BM_threaded/threads:16        47934785 ns     16506729 ns           16
BM_threaded/threads:32        39456256 ns      5811854 ns           32
BM_threaded/threads:64        40633645 ns      3597184 ns           64
BM_threaded/threads:128       46344643 ns      2504077 ns          128
BM_threaded/threads:256       41461946 ns      1159753 ns          256
BM_threaded/threads:512       35033299 ns       612884 ns          512
BM_threaded/threads:1024      33181586 ns       340705 ns         1024
BM_threaded/threads:2048       4081943 ns       147625 ns         2048
BM_threaded/threads:4096        862596 ns        41268 ns         4096
BM_threaded/threads:8192        106611 ns        13279 ns         8192
================================================================================

the point at which the real time starts dropping (2048 threads) in the new timing tells me something about how my system can perform for this task. the old timing tells me nothing that the CPU timing didn't already tell me.

i think this convinces me that it's the right thing to do. @LebedevRI ?

@LebedevRI
Copy link
Collaborator

i think this convinces me that it's the right thing to do. @LebedevRI ?

As i've consistently said previously, i don't see how the current behavior could be the expected one,
so i still agree with myself on that :) That being said, this really should result in v1.9.0 not v1.8.6.
I suspect that any dissenting opinions on this change will arrive after this is merged,
unless it can be first merged into google first?

@dmah42 dmah42 changed the title Introduce per-thread times to console and json reporters Ensure reported Time is walltime by removing spurious scaling by threads Aug 12, 2024
@dmah42
Copy link
Member Author

dmah42 commented Aug 12, 2024

(running some tests internally)

@dmah42 dmah42 added this to the v1.9.0 milestone Aug 12, 2024
@dmah42
Copy link
Member Author

dmah42 commented Aug 13, 2024

we have a plan for how to roll this out. I will land it, release 1.9.0, send out some emails with warnings, and absorb complaints.

but I'm convinced this is a bug we should fix.

@dmah42 dmah42 merged commit a008bf8 into main Aug 13, 2024
98 checks passed
@LebedevRI
Copy link
Collaborator

Yay, so we are doing the right thing then :)

@georgthegreat
Copy link

@dmah42, could you, please, clarify the relation between this PR and (unaccepted one at #946)?

This is the only hunk that remains applicable atop of 1.9.0

diff --git a/src/benchmark_runner.cc b/src/benchmark_runner.cc
index 7bc6b6329ef4..35912b5cdc46 100644
--- a/src/benchmark_runner.cc
+++ b/src/benchmark_runner.cc

+ // Adjust time stats to average since they were reported by all threads.
+ i.seconds /= b.threads;

Don't you think this division should be carried out?

@LebedevRI
Copy link
Collaborator

That line would affect the number of iterations for which a benchmark will be run.
By making i.seconds b.threads times smaller, we'd require to run b.threads times more iterations.
IOW #946 would have moved such scaling from the doing that globally (i.e. both for reports,
and iteration count prediction), to doing that just for iteration count prediction,
whereas this PR just dropped all such scaling.

robot-piglet added a commit to ytsaurus/ytsaurus that referenced this pull request Aug 23, 2024
As [#1836](google/benchmark#1836) has landed into upstream, there is no need to keep [#946](google/benchmark#946) as a patch.
d9d33ff20e1e7767759fc07ea97c1e661716f26a
robot-piglet added a commit to ytsaurus/ytsaurus-cpp-sdk that referenced this pull request Aug 23, 2024
As [#1836](google/benchmark#1836) has landed into upstream, there is no need to keep [#946](google/benchmark#946) as a patch.
d9d33ff20e1e7767759fc07ea97c1e661716f26a
robot-piglet added a commit to ytsaurus/ytsaurus that referenced this pull request Aug 23, 2024
As [#1836](google/benchmark#1836) has landed into upstream, there is no need to keep [#946](google/benchmark#946) as a patch.
d9d33ff20e1e7767759fc07ea97c1e661716f26a
robot-piglet added a commit to catboost/catboost that referenced this pull request Aug 23, 2024
As [#1836](google/benchmark#1836) has landed into upstream, there is no need to keep [#946](google/benchmark#946) as a patch.
d9d33ff20e1e7767759fc07ea97c1e661716f26a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants