Bench Mandelbrot for algebraic speed assessment #190

ckormanyos · 2025-01-17T17:04:47Z

The purpose of this issue is to do some dedicated algebraic performance testing of the double-float backend versus various similar competitors. This issue might have a bit of depth, so we make a separate issue here for the diedicated discussion.

Cc: @cosurgi

cosurgi · 2025-01-17T23:51:17Z

I have finally some good news! Here's the latest YADE benchmark, yade -n --quickperformance -j 4. I am working with branch cpp_double_fp_backend commit c6ce52d.

The trick was to use a different compiler!

clang++ 19.1.7, -O3

type	calculation speed	factor
`cpp_double_double` clang++ 19.1.7	205.0179 iter/sec	1
`cpp_bin_float<32>` clang++ 19.1.7	95.3581 iter/sec	2.14
`cpp_dec_float<31>` clang++ 19.1.7	49.1410 iter/sec	4.17
`mpfr_float_backend<31>` clang++ 19.1.7	31.7974 iter/sec	6.44

g++ 14.2, -O3 (compared to clang cpp_double_double)

type	calculation speed	factor
`float128` g++ 14.2	165.5817 iter/sec	1.23
`cpp_bin_float<32>` g++ 14.2	98.4807 iter/sec	2.08
`cpp_dec_float<31>` g++ 14.2	51.2811 iter/sec	3.99
`cpp_double_double` g++ 14.2	34.2752 iter/sec	5.98
`mpfr_float_backend<31>` g++ 14.2	30.6851 iter/sec	6.68

So all other types perform at nearly the same speeds for both compilers. The only exception is cpp_double_double which has a huge difference in performance: cpp_double_double is 6 times slower with g++ ! Is that a problem with g++ ? What else could it be?

And here are the results for cpp_double_long_double

clang++ 19.1.7, cpp_double_long_double

type	calculation speed	factor
`cpp_double_long_double` clang++ 19.1.7	88.2753 iter/sec	1
`cpp_bin_float<39>` clang++ 19.1.7	59.1809 iter/sec	1.49
`cpp_dec_float<39>` clang++ 19.1.7	45.8211 iter/sec	1.92
`mpfr_float_backend<39>` clang++ 19.1.7	28.8586 iter/sec	3.05

g++ 14.2, cpp_double_long_double

type	calculation speed	factor
`cpp_bin_float<39>` g++ 14.2	60.7156 iter/sec	1.45
`cpp_dec_float<39>` g++ 14.2	46.4605 iter/sec	1.90
`mpfr_float_backend<39>` g++ 14.2	27.2021 iter/sec	3.24
`cpp_double_long_double` g++ 14.2	15.5230 iter/sec	5.68

Again, g++ is 6 times slower than clang for cpp_double_long_double.

So Chris, if you used a different compiler for your Mandelbrot float128 vs. cpp_double_double benchmark, it may explain the discrepancies. But AFAIK only g++ supports float128 so maybe there is still something to explain. But at least I got some good results from clang!

ckormanyos · 2025-01-18T10:34:09Z

I have finally some good news!

This is indeed going in the right direction. But the mystery on g++ confuses us still (sadly).

So all other types perform at nearly the same speeds for both compilers. The only exception is cpp_double_double which has a huge difference in performance: cpp_double_double is 6 times slower with g++ ! Is that a problem with g++ ? What else could it be?

This is our last real big open issue.

In the next post I provide my mandelbrot benches

ckormanyos · 2025-01-18T10:42:36Z

Hi Janek (@cosurgi)

I've prepared the Mandelbrot benhmark and you can hopefully successfully run it locally.

In ckormanyos/mandelbrot, you will find the option_cpp_double_double branch.

On commit 40004acef31953f4b25eeb38e452446990ad55f9, the benchmark is ready for your consumption. You will need to make a few tiny adjustments when calling build_all.sh to build for each backend.

Building

First locate build_all.sh. Then you can build in the bash shell with a command like:

./build_all.sh --boost='-I/mnt/c/ChrisGitRepos/boost_gsoc2021/multiprecision/include -I/mnt/c/boost/boost_1_87_0' --my_cc=clang++ --stdcc=gnu++20

You can change the compiler and boost location(s) and language standards on the command line. The order of the parameters does not matter. The default checked in is for cpp_dec_float<32>.

In order to change to, let's say, cpp_double_double, you must:

Go into build_all.sh. There you must sadly manually edit.
Uncomment the build line with -DMANDELBROT_USE_DOUBLE_DOUBLE (this is line 74).
But then DO comment out line 73.
You must link with -lquadmath so comment/uncomment the pairs of lines 79/80.
Follow a similar procedure for -DMANDELBROT_USE_FLOAT128.

ckormanyos · 2025-01-18T10:47:04Z

My timings on WSL2 are as follows:

Using g++

backend	time
`cpp_double_double`	19s
`float128`	41s
`cpp_dec_float<32>`	95s

Using clang++

backend	time
`cpp_double_double`	15s
`float128`	-
`cpp_dec_float<32>`	74s

cosurgi · 2025-01-18T15:42:48Z

Wow, Chris, I did not expect a fully fledged software package like your awesome https://github.com/ckormanyos/mandelbrot ! :-) And simple to use too!

The mystery is solved: g++ does not play well with my 11 year old CPU, while clang somehow manages to squeeze some optimizations in. Having seen the bad results, I decided to try on my wife's PC, which was recently upgraded. And it turns out that g++ plays well with a modern CPU. Have a look at these results:

CPU Intel Xeon E5-2687W v2 (11 years old CPU)

clang++ 14.0.6

backend	time
cpp_double_fp_backend	18.2s
float128_backend	-
cpp_dec_float<32>	57.6s

g++ 12.2.0

backend	time
cpp_double_fp_backend	257.1s
float128_backend	41.6s
cpp_dec_float<32>	62.9s

clang++ 19.1.7

backend	time
cpp_double_fp_backend	15.8s
float128_backend	-
cpp_dec_float<32>	58.8s

g++ 14.2.0

backend	time
cpp_double_fp_backend	268.7s
float128_backend	37.0s
cpp_dec_float<32>	64.9s

CPU Intel i7-14700KF (2 years old CPU)

clang++ 19.1.4

backend	time
cpp_double_fp_backend	11.0s
float128_backend	-
cpp_dec_float<32>	57.0s

g++ 12.2.0

backend	time
cpp_double_fp_backend	13.5s
float128_backend	37.4s
cpp_dec_float<32>	91.6s

I am skeptical if we could convince g++ developers to suddenly add better support for better optimization for an 11 year old CPU. But we know what goes on here and the mystery is solved.

ckormanyos · 2025-01-18T16:53:58Z

The mystery is solved: g++ does not play well with my 11 year old CPU, while clang somehow manages to squeeze some optimizations in. Having seen the bad results, I decided to try on my wife's PC, which was recently upgraded. And it turns out that g++ plays well with a modern CPU. Have a look at these results:

[snip] Janek then shows good timing results on alternate PC

Yeah Janek (@cosurgi), way to stick with it! I am really glad we resolved this little bump-in-the-road-style mystery. Thank you for driving forward with this. I was getting a bit scared.

So here is what we are going to do.

I have some modifications and I need to pump up the coverage results again. This is easy.
I'll add timing info and experience-reports to the docs.
Then I will build and push the docs.
Then this thing is good to go.

There is a lot of optimization potential down the road for double-floating-point. I just addressed the basic, obvious optimization points at the moment. So this thing will get even faster later.

It is, however, somewhat ominous how potentially non-portable the performance boost on this thing may be. So that might lead to some interesting issues down the road.

Anyway, I see no further blocking points regarding forward motion on cpp_double_fp_backend at the moment. So let's finish this thing and move forward!

Cc: @sinandredemption and @jzmaddock

ckormanyos · 2025-01-18T16:57:49Z

Wow, Chris, I did not expect a fully fledged software package like your awesome https://github.com/ckormanyos/mandelbrot ! :-) And simple to use too!

Yeah Janek (@cosurgi) that thing is one of my retirement toys (in a couple years). I want to put some of the iterative schemes on GPU and really hammer down on iterations and fractals. Personally, I struggle with finding good orbits and interesting points to dive down into. But I have a few nice ones.

ckormanyos · 2025-01-18T17:00:27Z

OK algebraic aclculations are fast. We got this. So I am closing this issue.

cosurgi · 2025-01-18T18:44:13Z

Personally, I struggle with finding good orbits and interesting points to dive down into. But I have a few nice ones.

As a kid I was playing a lot with fractint software. I see it is still here as a debian package xfractint. It has a decent graphical interface, so you can just point and click to zoom on interesting areas. It has arbitrary precision too, so you can zoom in really deep. And it gives the exact coordinates. So with the graphical interface you can find interesting coordinates quickly.

ckormanyos · 2025-01-18T19:59:20Z

As a kid I was playing a lot with fractint software. I see it is still here as a debian package xfractint.

Very cool Thank you for the tip Janek.

So with the graphical interface you can find interesting coordinates quickly.

You might want to try my "MandelbrotDiscovery" in the same repo. But it is Windows only at the moment.

ckormanyos mentioned this issue Jan 17, 2025

Integrate cpp_double_fp_backend boostorg/multiprecision#648

Open

ckormanyos self-assigned this Jan 17, 2025

ckormanyos added enhancement New feature or request optimization labels Jan 17, 2025

ckormanyos closed this as completed Jan 18, 2025

ckormanyos mentioned this issue Jan 18, 2025

Make Quickbook docs #165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bench Mandelbrot for algebraic speed assessment #190

Bench Mandelbrot for algebraic speed assessment #190

ckormanyos commented Jan 17, 2025

cosurgi commented Jan 17, 2025 •

edited

Loading

ckormanyos commented Jan 18, 2025

ckormanyos commented Jan 18, 2025 •

edited

Loading

ckormanyos commented Jan 18, 2025

cosurgi commented Jan 18, 2025

ckormanyos commented Jan 18, 2025 •

edited

Loading

ckormanyos commented Jan 18, 2025 •

edited

Loading

ckormanyos commented Jan 18, 2025

cosurgi commented Jan 18, 2025

ckormanyos commented Jan 18, 2025 •

edited

Loading

Bench Mandelbrot for algebraic speed assessment #190

Bench Mandelbrot for algebraic speed assessment #190

Comments

ckormanyos commented Jan 17, 2025

cosurgi commented Jan 17, 2025 • edited Loading

ckormanyos commented Jan 18, 2025

ckormanyos commented Jan 18, 2025 • edited Loading

Building

ckormanyos commented Jan 18, 2025

Using g++

Using clang++

cosurgi commented Jan 18, 2025

CPU Intel Xeon E5-2687W v2 (11 years old CPU)

clang++ 14.0.6

g++ 12.2.0

clang++ 19.1.7

g++ 14.2.0

CPU Intel i7-14700KF (2 years old CPU)

clang++ 19.1.4

g++ 12.2.0

ckormanyos commented Jan 18, 2025 • edited Loading

ckormanyos commented Jan 18, 2025 • edited Loading

ckormanyos commented Jan 18, 2025

cosurgi commented Jan 18, 2025

ckormanyos commented Jan 18, 2025 • edited Loading

cosurgi commented Jan 17, 2025 •

edited

Loading

ckormanyos commented Jan 18, 2025 •

edited

Loading

ckormanyos commented Jan 18, 2025 •

edited

Loading

ckormanyos commented Jan 18, 2025 •

edited

Loading

ckormanyos commented Jan 18, 2025 •

edited

Loading