Add Jmh benchmark submodule #109

spkrka · 2021-06-03T14:30:27Z

Summary:
The distribution metric is faster when it's not contended,
but degrades during heavy contention.

Analysis:
Histogram throughput scales almost linearly with number of cores,
and maxes out at 52 ops/us, both with 4 and 8 threads.
Contention does not appear to be an issue there.

Distribution throughput is best when running with a single thread,
maxing out at 47 ops/us and drops down to 13 ops/us when running
with 4 threads. Going to 8 threads actually increases the throughput
though.

With a single thread, histogram runs in 0.055 us/op on average
and distribution runs in 0.021 us/op which is more than twice as fast.

However, when running with 4 threads concurrently, the distribution
average speed is 0.325 us/op (15x slower) compared to histogram which runs at
0.073 us/op (1.3x slower).

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                            Mode       Cnt      Score    Error   Units
DistributionBenchmark.dist1                         thrpt        10     47.352 ±  0.815  ops/us
DistributionBenchmark.dist2                         thrpt        10     16.576 ±  0.142  ops/us
DistributionBenchmark.dist4                         thrpt        10     13.032 ±  0.244  ops/us
DistributionBenchmark.dist8                         thrpt        10     26.443 ±  0.857  ops/us
DistributionBenchmark.hist1                         thrpt        10     18.029 ±  0.069  ops/us
DistributionBenchmark.hist2                         thrpt        10     35.253 ±  0.570  ops/us
DistributionBenchmark.hist4                         thrpt        10     52.909 ±  1.103  ops/us
DistributionBenchmark.hist8                         thrpt        10     53.726 ±  1.669  ops/us
DistributionBenchmark.dist1                          avgt        10      0.021 ±  0.001   us/op
DistributionBenchmark.dist2                          avgt        10      0.127 ±  0.007   us/op
DistributionBenchmark.dist4                          avgt        10      0.302 ±  0.012   us/op
DistributionBenchmark.dist8                          avgt        10      0.325 ±  0.030   us/op
DistributionBenchmark.hist1                          avgt        10      0.055 ±  0.001   us/op
DistributionBenchmark.hist2                          avgt        10      0.057 ±  0.001   us/op
DistributionBenchmark.hist4                          avgt        10      0.073 ±  0.003   us/op
DistributionBenchmark.hist8                          avgt        10      0.148 ±  0.006   us/op

Summary: The distribution metric is faster when it's not contended, but degrades during heavy contention. Analysis: Histogram throughput scales almost linearly with number of cores, and maxes out at 52 ops/us, both with 4 and 8 threads. Contention does not appear to be an issue there. Distribution throughput is best when running with a single thread, maxing out at 47 ops/us and drops down to 13 ops/us when running with 4 threads. Going to 8 threads actually increases the throughput though. With a single thread, histogram runs in 0.055 us/op on average and distribution runs in 0.021 us/op which is more than twice as fast. However, when running with 4 threads concurrently, the distribution average speed is 0.325 us/op (15x slower) compared to histogram which runs at 0.073 us/op (1.3x slower). REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. Benchmark Mode Cnt Score Error Units DistributionBenchmark.dist1 thrpt 10 47.352 ± 0.815 ops/us DistributionBenchmark.dist2 thrpt 10 16.576 ± 0.142 ops/us DistributionBenchmark.dist4 thrpt 10 13.032 ± 0.244 ops/us DistributionBenchmark.dist8 thrpt 10 26.443 ± 0.857 ops/us DistributionBenchmark.hist1 thrpt 10 18.029 ± 0.069 ops/us DistributionBenchmark.hist2 thrpt 10 35.253 ± 0.570 ops/us DistributionBenchmark.hist4 thrpt 10 52.909 ± 1.103 ops/us DistributionBenchmark.hist8 thrpt 10 53.726 ± 1.669 ops/us DistributionBenchmark.dist1 avgt 10 0.021 ± 0.001 us/op DistributionBenchmark.dist2 avgt 10 0.127 ± 0.007 us/op DistributionBenchmark.dist4 avgt 10 0.302 ± 0.012 us/op DistributionBenchmark.dist8 avgt 10 0.325 ± 0.030 us/op DistributionBenchmark.hist1 avgt 10 0.055 ± 0.001 us/op DistributionBenchmark.hist2 avgt 10 0.057 ± 0.001 us/op DistributionBenchmark.hist4 avgt 10 0.073 ± 0.003 us/op DistributionBenchmark.hist8 avgt 10 0.148 ± 0.006 us/op

spkrka force-pushed the krka/jmh branch from 0cab008 to 606e9c0 Compare June 3, 2021 14:31

ao2017 approved these changes Jun 3, 2021

View reviewed changes

ao2017 merged commit 0412962 into master Jun 3, 2021

delete-merged-branch bot deleted the krka/jmh branch June 3, 2021 19:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Jmh benchmark submodule #109

Add Jmh benchmark submodule #109

spkrka commented Jun 3, 2021 •

edited

Loading

Add Jmh benchmark submodule #109

Add Jmh benchmark submodule #109

Conversation

spkrka commented Jun 3, 2021 • edited Loading

spkrka commented Jun 3, 2021 •

edited

Loading