Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve argument reduction #4158

Merged
merged 5 commits into from
Jan 7, 2025
Merged

Improve argument reduction #4158

merged 5 commits into from
Jan 7, 2025

Conversation

pleroy
Copy link
Member

@pleroy pleroy commented Jan 6, 2025

Don't use the roundsd instruction but use an addition and a subtraction to drop the fractional part. This in turn makes it possible to use FMA in two places.

Comparison on Zen 3. Before:

RAW TSC:                         min      1‰      1%      5%     10%     25%     50%
            identity            4.94   +0.00   +0.00   +0.00   +0.38   +0.38   +0.38
    sqrtps_xmm0_xmm0           16.72   +0.38   +0.38   +0.38   +0.38   +0.38   +0.38
     mulsd_xmm0_xmm0            7.60   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
  mulsd_xmm0_xmm0_4x           15.20   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
Slope: 1.186186 cycle/TSC
Correlation coefficient: 0.999887
Cycles:             expected     min      1‰      1%      5%     10%     25%     50%
R           identity       0   -0.07   +0.00   +0.00   +0.00   +0.45   +0.45   +0.45
R    mulsd_xmm0_xmm0       3    3.08   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
R mulsd_xmm0_xmm0_4x      12   12.10   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
       principia_cos           66.19   +0.00   +0.45   +0.45   +0.45   +0.90   +0.90
       principia_sin           67.09   +0.45   +0.45   +0.45   +0.45   +0.90   +0.90
R   sqrtps_xmm0_xmm0      14   13.90   +0.45   +0.45   +0.45   +0.45   +0.45   +0.45
             std_cos           54.47   +0.45   +0.45   +0.45   +0.45   +0.90   +0.90
             std_sin           63.03   +0.00   +0.00   +0.00   +0.00   +0.00   +0.45

After:

RAW TSC:                         min      1‰      1%      5%     10%     25%     50%
            identity            4.94   +0.00   +0.00   +0.00   +0.38   +0.38   +0.38
    sqrtps_xmm0_xmm0           16.72   +0.38   +0.38   +0.38   +0.38   +0.38   +0.38
     mulsd_xmm0_xmm0            7.60   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
  mulsd_xmm0_xmm0_4x           15.20   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
Slope: 1.186186 cycle/TSC
Correlation coefficient: 0.999887
Cycles:             expected     min      1‰      1%      5%     10%     25%     50%
R           identity       0   -0.07   +0.00   +0.00   +0.00   +0.45   +0.45   +0.45
R    mulsd_xmm0_xmm0       3    3.08   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
R mulsd_xmm0_xmm0_4x      12   12.10   +0.00   +0.00   +0.00   +0.00   +0.00   +0.00
       principia_cos           65.73   +0.00   +0.00   +0.45   +0.45   +0.90   +0.90
       principia_sin           64.83   +1.35   +1.80   +1.80   +1.80   +2.25   +2.25
R   sqrtps_xmm0_xmm0      14   13.90   +0.45   +0.45   +0.45   +0.45   +0.45   +0.45
             std_cos           55.37   +0.45   +0.90   +0.90   +0.90   +0.90   +0.90
             std_sin           63.03   +0.00   +0.00   +0.00   +0.00   +0.00   +0.45

Comparison on Golden Cove. Before:

RAW TSC:                         min      1‰      1%      5%     10%     25%     50%
            identity            6.82   +0.16   +0.18   +0.20   +0.22   +0.24   +1.72
    sqrtps_xmm0_xmm0           27.04   +0.22   +0.26   +0.26   +0.28   +0.32   +0.36
     mulsd_xmm0_xmm0           14.04   +0.14   +0.18   +0.62   +0.64   +0.66   +0.70
  mulsd_xmm0_xmm0_4x           32.34   +0.12   +0.20   +1.30   +1.32   +1.36   +1.42
Slope: 0.623174 cycle/TSC
Correlation coefficient: 0.998850
Cycles:             expected     min      1‰      1%      5%     10%     25%     50%
R           identity       0   -0.26   +0.10   +0.11   +0.12   +0.14   +0.15   +0.16
R    mulsd_xmm0_xmm0       4    4.26   +0.34   +0.36   +0.37   +0.39   +0.40   +0.42
R mulsd_xmm0_xmm0_4x      16   15.65   +0.10   +0.79   +0.82   +0.82   +0.85   +0.88
       principia_cos           71.20   +0.90   +1.45   +2.18   +2.58   +3.10   +3.60
       principia_sin           75.37   +0.70   +0.91   +1.11   +1.21   +1.41   +1.65
R   sqrtps_xmm0_xmm0      12   12.39   +0.10   +0.12   +0.14   +0.15   +0.17   +0.20
             std_cos           54.51   +1.10   +1.27   +1.38   +1.43   +1.53   +1.66
             std_sin           64.19   +0.14   +0.20   +0.26   +0.30   +0.37   +0.46

After:

RAW TSC:                         min      1‰      1%      5%     10%     25%     50%
            identity            6.74   +0.22   +0.24   +0.26   +0.28   +0.30   +0.32
    sqrtps_xmm0_xmm0           26.94   +0.32   +0.34   +0.36   +0.38   +0.42   +0.46
     mulsd_xmm0_xmm0           14.04   +0.14   +0.58   +0.62   +0.64   +0.66   +0.70
  mulsd_xmm0_xmm0_4x           32.36   +0.10   +1.22   +1.28   +1.30   +1.34   +1.40
Slope: 0.622294 cycle/TSC
Correlation coefficient: 0.998924
Cycles:             expected     min      1‰      1%      5%     10%     25%     50%
R           identity       0   -0.23   +0.11   +0.11   +0.12   +0.14   +0.15   +0.16
R    mulsd_xmm0_xmm0       4    4.30   +0.06   +0.09   +0.36   +0.37   +0.39   +0.41
R mulsd_xmm0_xmm0_4x      16   15.64   +0.10   +0.19   +0.83   +0.85   +0.87   +0.91
       principia_cos           69.40   +0.35   +0.54   +0.71   +0.80   +0.97   +1.17
       principia_sin           70.49   +0.41   +0.58   +0.73   +0.82   +0.97   +1.15
R   sqrtps_xmm0_xmm0      12   12.44   +0.06   +0.09   +0.10   +0.11   +0.12   +0.15
             std_cos           55.23   +0.42   +0.52   +0.62   +0.68   +0.77   +0.88
             std_sin           64.13   +0.12   +0.20   +0.26   +0.30   +0.37   +0.46

pleroy added 5 commits January 1, 2025 02:32
…y pessimizing.

Numbers on Golden Cove.  Before:
       principia_cos           71.25   +0.67   +1.17   +1.83   +2.27   +2.80   +3.29
       principia_sin           75.37   +0.66   +0.92   +1.18   +1.36   +9.97  +10.21
After:
       principia_cos           69.01   +0.47   +0.66   +0.82   +0.92   +1.07   +1.25
       principia_sin           70.20   +0.37   +0.56   +0.71   +0.81   +0.95   +1.17

Unfortunately, on Zen3 this is pessimizing.  Before:
       principia_cos           65.73   +0.45   +0.90   +0.90   +0.90   +1.35   +1.35
       principia_sin           66.19   +1.35   +1.35   +1.35   +1.35   +1.80   +1.80
After:
       principia_cos           68.44   +0.45   +0.45   +0.45   +0.45   +0.90   +0.90
       principia_sin           67.99   +1.35   +1.35   +1.35   +1.80   +1.80   +2.25
@eggrobin eggrobin added the LGTM label Jan 7, 2025
@pleroy pleroy merged commit 535e54c into mockingbirdnest:master Jan 7, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants