Implementation of Ord for integers is suboptimal #63758
Labels
C-enhancement
Category: An issue proposing an enhancement or a PR with one.
E-easy
Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.
E-help-wanted
Call for participation: Help is requested to fix this issue.
E-mentor
Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.
I-slow
Issue: Problems and improvements with respect to performance of generated code.
T-libs-api
Relevant to the library API team, which will review and decide on the PR/issue.
The current implementation of
Ord::cmp
for integral types results in less than optimal code. Currently the implementation looks something like this:This results in the following IR:
which, on x86, becomes
where the critical path looks like this (courtesy of
llvm-mca -mcpu=broadwell
):if the implementation instead was:
the IR would become
which in turn would compile down to
for which the critical path (as expected) only becomes visible on a 2nd iteration:
llvm-mca reports that the reciprocal throughput for the improved version is 1.5 or lower (getting as low as 1.3 on znver1; lower is better for reciprocal throughput) for various x86 architectures whereas the old code always exceeds 2.0, reaching 3.0 on
core2
.The text was updated successfully, but these errors were encountered: