Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trimming out one eight-digit optimization. #152

Merged
merged 1 commit into from
Nov 15, 2022

Conversation

lemire
Copy link
Member

@lemire lemire commented Nov 15, 2022

This PR balances the performance trade-off differently. Credit to @mwalcott3 for raising the issue.

We use https://github.com/lemire/simple_fastfloat_benchmark as a reference.

Using an Apple M2 processor and LLVM 14.

Current code (main branch):

-f data/canada.txt
fastfloat                               :  1289.31 MB/s (+/- 0.9 %)    74.09 Mfloat/s      17.71 i/B   323.23 i/f (+/- 0.0 %)      2.59 c/B    47.30 c/f (+/- 0.6 %)      6.83 i/c      3.50 GHz 
-f data/canada_short.txt
fastfloat                               :   543.83 MB/s (+/- 0.5 %)   101.04 Mfloat/s      41.34 i/B   233.29 i/f (+/- 0.0 %)      6.15 c/B    34.68 c/f (+/- 0.2 %)      6.73 i/c      3.50 GHz 
-f data/mesh.txt
fastfloat                               :   863.78 MB/s (+/- 1.4 %)   117.67 Mfloat/s      26.42 i/B   203.39 i/f (+/- 0.0 %)      3.87 c/B    29.78 c/f (+/- 0.3 %)      6.83 i/c      3.50 GHz 
-m uniform
fastfloat                               :  1741.33 MB/s (+/- 0.4 %)    83.00 Mfloat/s      14.05 i/B   309.04 i/f (+/- 0.0 %)      1.92 c/B    42.22 c/f (+/- 0.2 %)      7.32 i/c      3.50 GHz 
-m uniform -c
fastfloat                               :  1315.29 MB/s (+/- 0.5 %)    75.49 Mfloat/s      14.77 i/B   269.87 i/f (+/- 0.0 %)      2.54 c/B    46.42 c/f (+/- 0.2 %)      5.81 i/c      3.50 GHz 
-m simple_uniform32
fastfloat                               :  1693.26 MB/s (+/- 0.9 %)    80.70 Mfloat/s      14.05 i/B   309.04 i/f (+/- 0.0 %)      1.92 c/B    42.23 c/f (+/- 0.2 %)      7.32 i/c      3.41 GHz 
-m simple_uniform32 -c
fastfloat                               :  1287.28 MB/s (+/- 1.0 %)    73.87 Mfloat/s      14.77 i/B   269.84 i/f (+/- 0.0 %)      2.52 c/B    46.13 c/f (+/- 0.5 %)      5.85 i/c      3.41 GHz 
-m simple_int32
fastfloat                               :  1274.86 MB/s (+/- 1.0 %)   137.18 Mfloat/s      15.83 i/B   154.21 i/f (+/- 0.0 %)      2.55 c/B    24.85 c/f (+/- 0.2 %)      6.21 i/c      3.41 GHz 

Current PR:


-f data/canada.txt
fastfloat                               :  1396.42 MB/s (+/- 3.6 %)    80.25 Mfloat/s      15.90 i/B   290.11 i/f (+/- 0.0 %)      2.39 c/B    43.65 c/f (+/- 0.9 %)      6.65 i/c      3.50 GHz 
-f data/canada_short.txt
fastfloat                               :   616.57 MB/s (+/- 0.4 %)   114.56 Mfloat/s      37.59 i/B   212.15 i/f (+/- 0.0 %)      5.27 c/B    29.75 c/f (+/- 0.2 %)      7.13 i/c      3.41 GHz 
-f data/mesh.txt
fastfloat                               :   871.20 MB/s (+/- 1.3 %)   118.68 Mfloat/s      24.01 i/B   184.79 i/f (+/- 0.0 %)      3.72 c/B    28.62 c/f (+/- 1.0 %)      6.46 i/c      3.40 GHz 
-m uniform
fastfloat                               :  1947.18 MB/s (+/- 2.0 %)    92.81 Mfloat/s      12.59 i/B   277.04 i/f (+/- 0.0 %)      1.72 c/B    37.76 c/f (+/- 0.3 %)      7.34 i/c      3.50 GHz 
-m uniform -c
fastfloat                               :  1453.48 MB/s (+/- 1.5 %)    83.42 Mfloat/s      13.08 i/B   238.98 i/f (+/- 0.0 %)      2.30 c/B    42.01 c/f (+/- 0.8 %)      5.69 i/c      3.50 GHz 
-m simple_uniform32
fastfloat                               :  1937.96 MB/s (+/- 2.9 %)    92.37 Mfloat/s      12.59 i/B   277.04 i/f (+/- 0.0 %)      1.72 c/B    37.76 c/f (+/- 1.0 %)      7.34 i/c      3.49 GHz 
-m simple_uniform32 -c
fastfloat                               :  1405.33 MB/s (+/- 1.0 %)    80.66 Mfloat/s      13.08 i/B   239.02 i/f (+/- 0.0 %)      2.31 c/B    42.25 c/f (+/- 0.2 %)      5.66 i/c      3.41 GHz 
-m simple_int32
fastfloat                               :   999.56 MB/s (+/- 1.1 %)   107.60 Mfloat/s      20.16 i/B   196.41 i/f (+/- 0.0 %)      3.25 c/B    31.67 c/f (+/- 0.2 %)      6.20 i/c      3.41 GHz 

Removing all eight-digit optimizations:

-f data/canada.txt
fastfloat                               :  1078.35 MB/s (+/- 0.6 %)    61.97 Mfloat/s      17.61 i/B   321.38 i/f (+/- 0.0 %)      3.10 c/B    56.55 c/f (+/- 0.4 %)      5.68 i/c      3.50 GHz 
-f data/canada_short.txt
fastfloat                               :   637.69 MB/s (+/- 0.4 %)   118.48 Mfloat/s      36.88 i/B   208.15 i/f (+/- 0.0 %)      5.24 c/B    29.58 c/f (+/- 0.2 %)      7.04 i/c      3.50 GHz 
-f data/mesh.txt
fastfloat                               :   828.19 MB/s (+/- 1.0 %)   112.82 Mfloat/s      25.52 i/B   196.46 i/f (+/- 0.0 %)      4.00 c/B    30.80 c/f (+/- 1.7 %)      6.38 i/c      3.47 GHz 
-m uniform
fastfloat                               :  1330.99 MB/s (+/- 2.4 %)    63.44 Mfloat/s      16.41 i/B   361.04 i/f (+/- 0.0 %)      2.51 c/B    55.14 c/f (+/- 1.6 %)      6.55 i/c      3.50 GHz 
-m uniform -c
fastfloat                               :  1016.28 MB/s (+/- 1.6 %)    58.32 Mfloat/s      17.60 i/B   321.62 i/f (+/- 0.0 %)      3.20 c/B    58.44 c/f (+/- 0.7 %)      5.50 i/c      3.41 GHz 
-m simple_uniform32
fastfloat                               :  1208.98 MB/s (+/- 4.8 %)    57.62 Mfloat/s      16.41 i/B   361.04 i/f (+/- 0.0 %)      2.53 c/B    55.55 c/f (+/- 2.2 %)      6.50 i/c      3.20 GHz 
-m simple_uniform32 -c
fastfloat                               :  1016.17 MB/s (+/- 2.5 %)    58.32 Mfloat/s      17.60 i/B   321.59 i/f (+/- 0.0 %)      3.20 c/B    58.44 c/f (+/- 0.8 %)      5.50 i/c      3.41 GHz 
-m simple_int32
fastfloat                               :   998.67 MB/s (+/- 1.6 %)   107.50 Mfloat/s      20.16 i/B   196.41 i/f (+/- 0.0 %)      3.25 c/B    31.70 c/f (+/- 0.4 %)      6.20 i/c      3.41 GHz 

Using GCC 11 and an Intel Ice Lake (server) processor.

Current code (main branch):

-f data/canada.txt
fastfloat                               :   893.05 MB/s (+/- 1.0 %)    51.32 Mfloat/s      14.58 i/B   266.00 i/f (+/- 0.0 %)      2.78 c/B    50.66 c/f (+/- 0.4 %)      5.25 i/c      2.60 GHz 
-f data/canada_short.txt
fastfloat                               :   438.03 MB/s (+/- 1.0 %)    81.39 Mfloat/s      31.09 i/B   175.45 i/f (+/- 0.0 %)      5.64 c/B    31.85 c/f (+/- 0.6 %)      5.51 i/c      2.59 GHz 
-f data/mesh.txt
fastfloat                               :   739.34 MB/s (+/- 1.1 %)   100.72 Mfloat/s      19.49 i/B   150.06 i/f (+/- 0.0 %)      3.35 c/B    25.82 c/f (+/- 0.4 %)      5.81 i/c      2.60 GHz 
-m uniform
fastfloat                               :  1250.07 MB/s (+/- 0.6 %)    59.58 Mfloat/s      11.14 i/B   245.05 i/f (+/- 0.0 %)      1.98 c/B    43.63 c/f (+/- 0.1 %)      5.62 i/c      2.60 GHz 
-m uniform -c
fastfloat                               :  1090.83 MB/s (+/- 1.3 %)    62.61 Mfloat/s      11.12 i/B   203.19 i/f (+/- 0.0 %)      2.27 c/B    41.53 c/f (+/- 0.8 %)      4.89 i/c      2.60 GHz 
-m simple_uniform32
fastfloat                               :  1250.43 MB/s (+/- 0.7 %)    59.60 Mfloat/s      11.14 i/B   245.05 i/f (+/- 0.0 %)      1.98 c/B    43.63 c/f (+/- 0.1 %)      5.62 i/c      2.60 GHz 
-m simple_uniform32 -c
fastfloat                               :  1087.70 MB/s (+/- 1.1 %)    62.44 Mfloat/s      11.12 i/B   203.13 i/f (+/- 0.0 %)      2.28 c/B    41.58 c/f (+/- 0.7 %)      4.89 i/c      2.60 GHz 
-m simple_int32
fastfloat                               :  1204.84 MB/s (+/- 1.5 %)   129.66 Mfloat/s      11.32 i/B   110.33 i/f (+/- 0.0 %)      2.06 c/B    20.05 c/f (+/- 0.8 %)      5.50 i/c      2.60 GHz 

Current PR:

-f data/canada.txt
fastfloat                               :   913.96 MB/s (+/- 0.8 %)    52.52 Mfloat/s      13.48 i/B   245.99 i/f (+/- 0.0 %)      2.71 c/B    49.50 c/f (+/- 0.4 %)      4.97 i/c      2.60 GHz 
-f data/canada_short.txt
fastfloat                               :   447.29 MB/s (+/- 0.7 %)    83.11 Mfloat/s      29.67 i/B   167.45 i/f (+/- 0.0 %)      5.54 c/B    31.29 c/f (+/- 0.2 %)      5.35 i/c      2.60 GHz 
-f data/mesh.txt
fastfloat                               :   757.96 MB/s (+/- 1.2 %)   103.25 Mfloat/s      18.55 i/B   142.81 i/f (+/- 0.0 %)      3.27 c/B    25.18 c/f (+/- 0.5 %)      5.67 i/c      2.60 GHz 
-m uniform
fastfloat                               :  1308.01 MB/s (+/- 0.7 %)    62.34 Mfloat/s      10.41 i/B   229.05 i/f (+/- 0.0 %)      1.90 c/B    41.70 c/f (+/- 0.1 %)      5.49 i/c      2.60 GHz 
-m uniform -c
fastfloat                               :  1157.90 MB/s (+/- 4.2 %)    66.44 Mfloat/s      10.03 i/B   183.23 i/f (+/- 0.0 %)      2.14 c/B    39.13 c/f (+/- 1.2 %)      4.68 i/c      2.60 GHz 
-m simple_uniform32
fastfloat                               :  1310.04 MB/s (+/- 0.5 %)    62.44 Mfloat/s      10.41 i/B   229.05 i/f (+/- 0.0 %)      1.89 c/B    41.64 c/f (+/- 0.1 %)      5.50 i/c      2.60 GHz 
-m simple_uniform32 -c
fastfloat                               :  1166.98 MB/s (+/- 2.1 %)    66.98 Mfloat/s      10.02 i/B   183.13 i/f (+/- 0.0 %)      2.12 c/B    38.80 c/f (+/- 1.6 %)      4.72 i/c      2.60 GHz 
-m simple_int32
fastfloat                               :   743.80 MB/s (+/- 0.7 %)    80.05 Mfloat/s      17.16 i/B   167.17 i/f (+/- 0.0 %)      3.33 c/B    32.48 c/f (+/- 0.2 %)      5.15 i/c      2.60 GHz 

Removing all eight-digit optimizations:

-f data/canada.txt
fastfloat                               :   682.51 MB/s (+/- 0.7 %)    39.22 Mfloat/s      17.34 i/B   316.36 i/f (+/- 0.0 %)      3.62 c/B    66.13 c/f (+/- 0.5 %)      4.78 i/c      2.59 GHz 
-f data/canada_short.txt
fastfloat                               :   476.84 MB/s (+/- 0.9 %)    88.60 Mfloat/s      29.32 i/B   165.45 i/f (+/- 0.0 %)      5.20 c/B    29.34 c/f (+/- 0.3 %)      5.64 i/c      2.60 GHz 
-f data/mesh.txt
fastfloat                               :   591.76 MB/s (+/- 1.5 %)    80.61 Mfloat/s      21.66 i/B   166.73 i/f (+/- 0.0 %)      4.19 c/B    32.25 c/f (+/- 0.8 %)      5.17 i/c      2.60 GHz 
-m uniform
fastfloat                               :   709.28 MB/s (+/- 0.4 %)    33.81 Mfloat/s      16.91 i/B   372.05 i/f (+/- 0.0 %)      3.49 c/B    76.82 c/f (+/- 0.1 %)      4.84 i/c      2.60 GHz 
-m uniform -c
fastfloat                               :   642.08 MB/s (+/- 2.5 %)    36.85 Mfloat/s      17.48 i/B   319.38 i/f (+/- 0.0 %)      3.86 c/B    70.44 c/f (+/- 2.3 %)      4.53 i/c      2.60 GHz 
-m simple_uniform32
fastfloat                               :   711.33 MB/s (+/- 0.5 %)    33.90 Mfloat/s      16.91 i/B   372.05 i/f (+/- 0.0 %)      3.48 c/B    76.61 c/f (+/- 0.2 %)      4.86 i/c      2.60 GHz 
-m simple_uniform32 -c
fastfloat                               :   641.86 MB/s (+/- 2.7 %)    36.83 Mfloat/s      17.48 i/B   319.42 i/f (+/- 0.0 %)      3.86 c/B    70.52 c/f (+/- 2.4 %)      4.53 i/c      2.60 GHz 
-m simple_int32
fastfloat                               :   757.69 MB/s (+/- 0.8 %)    81.55 Mfloat/s      16.85 i/B   164.17 i/f (+/- 0.0 %)      3.27 c/B    31.88 c/f (+/- 0.3 %)      5.15 i/c      2.60 GHz 

Relates to #151

@lemire lemire merged commit eddf6df into main Nov 15, 2022
@lemire lemire deleted the dlemire/reducing_eight_digit_optimization branch January 28, 2023 01:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant