Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cody-Waite argument reduction for trigonometric functions #4112

Merged
merged 17 commits into from
Oct 11, 2024

Conversation

pleroy
Copy link
Member

@pleroy pleroy commented Oct 9, 2024

Benchmarks:

2024-10-11T15:14:59+02:00
Running C:\Users\phl\Projects\GitHub\Principia\Principia\Release\x64\benchmarks.exe
Run on (48 X 3793 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x24)
  L1 Instruction 32 KiB (x24)
  L2 Unified 512 KiB (x24)
  L3 Unified 32768 KiB (x4)
------------------------------------------------------------------------------------------------------
Benchmark                                                            Time             CPU   Iterations
------------------------------------------------------------------------------------------------------
BM_EvaluateElementaryFunction<Metric::Latency, std::sin>          13.4 ns         13.4 ns     56000000 cycles: 50.6264
BM_EvaluateElementaryFunction<Metric::Throughput, std::sin>       3.88 ns         3.84 ns    179200000 cycles: 14.6875
BM_EvaluateElementaryFunction<Metric::Latency, cr_sin>            32.6 ns         32.2 ns     21334000 cycles: 123.51
BM_EvaluateElementaryFunction<Metric::Throughput, cr_sin>         21.4 ns         21.8 ns     34462000 cycles: 81.165
BM_EvaluateElementaryFunction<Metric::Latency, Sin>               16.1 ns         16.0 ns     44800000 cycles: 60.9541
BM_EvaluateElementaryFunction<Metric::Throughput, Sin>            7.18 ns         7.25 ns    112000000 cycles: 27.202
BM_EvaluateElementaryFunction<Metric::Latency, std::cos>          13.8 ns         13.8 ns     49778000 cycles: 52.2563
BM_EvaluateElementaryFunction<Metric::Throughput, std::cos>       3.97 ns         3.90 ns    172308000 cycles: 15.0264
BM_EvaluateElementaryFunction<Metric::Latency, cr_cos>            35.9 ns         36.1 ns     19479000 cycles: 136.108
BM_EvaluateElementaryFunction<Metric::Throughput, cr_cos>         22.3 ns         22.0 ns     29867000 cycles: 84.3854
BM_EvaluateElementaryFunction<Metric::Latency, Cos>               14.7 ns         14.6 ns     44800000 cycles: 55.8376
BM_EvaluateElementaryFunction<Metric::Throughput, Cos>            7.12 ns         7.15 ns     89600000 cycles: 26.9696

#1760.

x_reduced.error = 0;
quadrant = 0;
} else if (x <= π_over_2_threshold && x >= -π_over_2_threshold) {
std::int64_t const n = _mm_cvtsd_si64(_mm_set_sd(x * (2 / π)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment noting that it so happens that 2 / π is correctly rounded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment explaining why we don't care too much about the rounding here.

quadrant = 0;
} else if (x <= π_over_2_threshold && x >= -π_over_2_threshold) {
std::int64_t const n = _mm_cvtsd_si64(_mm_set_sd(x * (2 / π)));
double const n_double = static_cast<double>(n);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this faster than computing it using _mm_floor_sd?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually significantly faster, shaving 7 cycles on Sin and 5 on Cos. The code is rather ugly, though. Updated the benchmark numbers.

@pleroy pleroy merged commit ab9f43c into mockingbirdnest:master Oct 11, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants