Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bench Mandelbrot for algebraic speed assessment #190

Closed
ckormanyos opened this issue Jan 17, 2025 · 10 comments
Closed

Bench Mandelbrot for algebraic speed assessment #190

ckormanyos opened this issue Jan 17, 2025 · 10 comments
Assignees
Labels
enhancement New feature or request optimization

Comments

@ckormanyos
Copy link
Member

The purpose of this issue is to do some dedicated algebraic performance testing of the double-float backend versus various similar competitors. This issue might have a bit of depth, so we make a separate issue here for the diedicated discussion.

Cc: @cosurgi

@cosurgi
Copy link
Collaborator

cosurgi commented Jan 17, 2025

I have finally some good news! Here's the latest YADE benchmark, yade -n --quickperformance -j 4. I am working with branch cpp_double_fp_backend commit c6ce52d.

The trick was to use a different compiler!

clang++ 19.1.7, -O3

type calculation speed factor
cpp_double_double clang++ 19.1.7 205.0179 iter/sec 1
cpp_bin_float<32> clang++ 19.1.7 95.3581 iter/sec 2.14
cpp_dec_float<31> clang++ 19.1.7 49.1410 iter/sec 4.17
mpfr_float_backend<31> clang++ 19.1.7 31.7974 iter/sec 6.44

g++ 14.2, -O3 (compared to clang cpp_double_double)

type calculation speed factor
float128 g++ 14.2 165.5817 iter/sec 1.23
cpp_bin_float<32> g++ 14.2 98.4807 iter/sec 2.08
cpp_dec_float<31> g++ 14.2 51.2811 iter/sec 3.99
cpp_double_double g++ 14.2 34.2752 iter/sec 5.98
mpfr_float_backend<31> g++ 14.2 30.6851 iter/sec 6.68

So all other types perform at nearly the same speeds for both compilers. The only exception is cpp_double_double which has a huge difference in performance: cpp_double_double is 6 times slower with g++ ! Is that a problem with g++ ? What else could it be?

And here are the results for cpp_double_long_double

clang++ 19.1.7, cpp_double_long_double

type calculation speed factor
cpp_double_long_double clang++ 19.1.7 88.2753 iter/sec 1
cpp_bin_float<39> clang++ 19.1.7 59.1809 iter/sec 1.49
cpp_dec_float<39> clang++ 19.1.7 45.8211 iter/sec 1.92
mpfr_float_backend<39> clang++ 19.1.7 28.8586 iter/sec 3.05

g++ 14.2, cpp_double_long_double

type calculation speed factor
cpp_bin_float<39> g++ 14.2 60.7156 iter/sec 1.45
cpp_dec_float<39> g++ 14.2 46.4605 iter/sec 1.90
mpfr_float_backend<39> g++ 14.2 27.2021 iter/sec 3.24
cpp_double_long_double g++ 14.2 15.5230 iter/sec 5.68

Again, g++ is 6 times slower than clang for cpp_double_long_double.

So Chris, if you used a different compiler for your Mandelbrot float128 vs. cpp_double_double benchmark, it may explain the discrepancies. But AFAIK only g++ supports float128 so maybe there is still something to explain. But at least I got some good results from clang!

@ckormanyos
Copy link
Member Author

I have finally some good news!

This is indeed going in the right direction. But the mystery on g++ confuses us still (sadly).

So all other types perform at nearly the same speeds for both compilers. The only exception is cpp_double_double which has a huge difference in performance: cpp_double_double is 6 times slower with g++ ! Is that a problem with g++ ? What else could it be?

This is our last real big open issue.

In the next post I provide my mandelbrot benches

@ckormanyos
Copy link
Member Author

ckormanyos commented Jan 18, 2025

Hi Janek (@cosurgi)

I've prepared the Mandelbrot benhmark and you can hopefully successfully run it locally.

In ckormanyos/mandelbrot, you will find the option_cpp_double_double branch.

On commit 40004acef31953f4b25eeb38e452446990ad55f9, the benchmark is ready for your consumption. You will need to make a few tiny adjustments when calling build_all.sh to build for each backend.

Building

First locate build_all.sh. Then you can build in the bash shell with a command like:

./build_all.sh --boost='-I/mnt/c/ChrisGitRepos/boost_gsoc2021/multiprecision/include -I/mnt/c/boost/boost_1_87_0' --my_cc=clang++ --stdcc=gnu++20

You can change the compiler and boost location(s) and language standards on the command line. The order of the parameters does not matter. The default checked in is for cpp_dec_float<32>.

In order to change to, let's say, cpp_double_double, you must:

  • Go into build_all.sh. There you must sadly manually edit.
  • Uncomment the build line with -DMANDELBROT_USE_DOUBLE_DOUBLE (this is line 74).
  • But then DO comment out line 73.
  • You must link with -lquadmath so comment/uncomment the pairs of lines 79/80.
  • Follow a similar procedure for -DMANDELBROT_USE_FLOAT128.

@ckormanyos
Copy link
Member Author

My timings on WSL2 are as follows:

Using g++

backend time
cpp_double_double 19s
float128 41s
cpp_dec_float<32> 95s

Using clang++

backend time
cpp_double_double 15s
float128 -
cpp_dec_float<32> 74s

@cosurgi
Copy link
Collaborator

cosurgi commented Jan 18, 2025

Wow, Chris, I did not expect a fully fledged software package like your awesome https://github.com/ckormanyos/mandelbrot ! :-) And simple to use too!

The mystery is solved: g++ does not play well with my 11 year old CPU, while clang somehow manages to squeeze some optimizations in. Having seen the bad results, I decided to try on my wife's PC, which was recently upgraded. And it turns out that g++ plays well with a modern CPU. Have a look at these results:

CPU Intel Xeon E5-2687W v2 (11 years old CPU)

clang++ 14.0.6

backend time
cpp_double_fp_backend 18.2s
float128_backend -
cpp_dec_float<32> 57.6s

g++ 12.2.0

backend time
cpp_double_fp_backend 257.1s
float128_backend 41.6s
cpp_dec_float<32> 62.9s

clang++ 19.1.7

backend time
cpp_double_fp_backend 15.8s
float128_backend -
cpp_dec_float<32> 58.8s

g++ 14.2.0

backend time
cpp_double_fp_backend 268.7s
float128_backend 37.0s
cpp_dec_float<32> 64.9s

CPU Intel i7-14700KF (2 years old CPU)

clang++ 19.1.4

backend time
cpp_double_fp_backend 11.0s
float128_backend -
cpp_dec_float<32> 57.0s

g++ 12.2.0

backend time
cpp_double_fp_backend 13.5s
float128_backend 37.4s
cpp_dec_float<32> 91.6s

I am skeptical if we could convince g++ developers to suddenly add better support for better optimization for an 11 year old CPU. But we know what goes on here and the mystery is solved.

@ckormanyos
Copy link
Member Author

ckormanyos commented Jan 18, 2025

The mystery is solved: g++ does not play well with my 11 year old CPU, while clang somehow manages to squeeze some optimizations in. Having seen the bad results, I decided to try on my wife's PC, which was recently upgraded. And it turns out that g++ plays well with a modern CPU. Have a look at these results:

[snip] Janek then shows good timing results on alternate PC

Yeah Janek (@cosurgi), way to stick with it! I am really glad we resolved this little bump-in-the-road-style mystery. Thank you for driving forward with this. I was getting a bit scared.

So here is what we are going to do.

  • I have some modifications and I need to pump up the coverage results again. This is easy.
  • I'll add timing info and experience-reports to the docs.
  • Then I will build and push the docs.
  • Then this thing is good to go.

There is a lot of optimization potential down the road for double-floating-point. I just addressed the basic, obvious optimization points at the moment. So this thing will get even faster later.

It is, however, somewhat ominous how potentially non-portable the performance boost on this thing may be. So that might lead to some interesting issues down the road.

Anyway, I see no further blocking points regarding forward motion on cpp_double_fp_backend at the moment. So let's finish this thing and move forward!

Cc: @sinandredemption and @jzmaddock

@ckormanyos
Copy link
Member Author

ckormanyos commented Jan 18, 2025

Wow, Chris, I did not expect a fully fledged software package like your awesome https://github.com/ckormanyos/mandelbrot ! :-) And simple to use too!

Yeah Janek (@cosurgi) that thing is one of my retirement toys (in a couple years). I want to put some of the iterative schemes on GPU and really hammer down on iterations and fractals. Personally, I struggle with finding good orbits and interesting points to dive down into. But I have a few nice ones.

@ckormanyos
Copy link
Member Author

OK algebraic aclculations are fast. We got this. So I am closing this issue.

@cosurgi
Copy link
Collaborator

cosurgi commented Jan 18, 2025

Personally, I struggle with finding good orbits and interesting points to dive down into. But I have a few nice ones.

As a kid I was playing a lot with fractint software. I see it is still here as a debian package xfractint. It has a decent graphical interface, so you can just point and click to zoom on interesting areas. It has arbitrary precision too, so you can zoom in really deep. And it gives the exact coordinates. So with the graphical interface you can find interesting coordinates quickly.

@ckormanyos
Copy link
Member Author

ckormanyos commented Jan 18, 2025

As a kid I was playing a lot with fractint software. I see it is still here as a debian package xfractint.

Very cool Thank you for the tip Janek.

So with the graphical interface you can find interesting coordinates quickly.

You might want to try my "MandelbrotDiscovery" in the same repo. But it is Windows only at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request optimization
Projects
None yet
Development

No branches or pull requests

2 participants