Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of Ord for integers is suboptimal #63758

Closed
nagisa opened this issue Aug 20, 2019 · 0 comments · Fixed by #63767
Closed

Implementation of Ord for integers is suboptimal #63758

nagisa opened this issue Aug 20, 2019 · 0 comments · Fixed by #63767
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-help-wanted Call for participation: Help is requested to fix this issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@nagisa
Copy link
Member

nagisa commented Aug 20, 2019

The current implementation of Ord::cmp for integral types results in less than optimal code. Currently the implementation looks something like this:

fn cmp(&self, other: &$t) -> Ordering {
    if *self == *other { Equal }
    else if *self < *other { Less }
    else { Greater }
}

This results in the following IR:

  %2 = icmp eq i64 %0, %1
  %3 = icmp ult i64 %0, %1
  %..i = select i1 %3, i8 -1, i8 1
  %_0.0.i = select i1 %2, i8 0, i8 %..i
  ret i8 %_0.0.i

which, on x86, becomes

	cmpq	%rsi, %rdi
	setae	%al
	cmpq	%rsi, %rdi
	je	.LBB0_1
	addb	%al, %al
	addb	$-1, %al
        ; exit (retq)
    .LBB0_1:
	xorl	%eax, %eax
        ; exit (retq)

where the critical path looks like this (courtesy of llvm-mca -mcpu=broadwell):

              Instruction                                 Dependency Information
 +----< 0.    cmpq	%rsi, %rdi
 +----> 1.    setae	%al                               ## REGISTER dependency:  %flags
 |      2.    cmpq	%rsi, %rdi
 |      3.    je	.LBB0_1
 +----> 4.    addb	%al, %al                          ## REGISTER dependency:  %al
 +----> 5.    addb	$-1, %al                          ## REGISTER dependency:  %al
 |      6.    ; exit
 |      7.    xorl	%eax, %eax
 |      8.    ; exit
 |
 |    < loop carried > 
 |
 +----> 2.    cmpq	%rsi, %rdi                        ## RESOURCE interference:  BWPort1 [ probability: 25% ]

if the implementation instead was:

    if a < b {
        std::cmp::Ordering::Less
    } else if a > b {
        std::cmp::Ordering::Greater
    } else {
        std::cmp::Ordering::Equal
    }

the IR would become

  %0 = icmp ult i64 %a, %b
  %1 = icmp ugt i64 %a, %b
  %. = zext i1 %1 to i8
  %_0.0 = select i1 %0, i8 -1, i8 %.
  ret i8 %_0.0

which in turn would compile down to

	movb	$-1, %al
	cmpq	%rsi, %rdi
	seta	%cl
	jb	.LBB1_2
	movl	%ecx, %eax
    .LBB1_2:
        ; exit (retq)

for which the critical path (as expected) only becomes visible on a 2nd iteration:

              Instruction                                 Dependency Information
 +----< 2.    seta	%cl
 |
 |    < loop carried > 
 |
 |      0.    movb	$-1, %al
 +----> 1.    cmpq	%rsi, %rdi                        ## RESOURCE interference:  BWPort6 [ probability: 97% ]
 +----> 2.    seta	%cl                               ## REGISTER dependency:  %flags
 |      3.    jb	.LBB1_2
 +----> 4.    movl	%ecx, %eax                        ## REGISTER dependency:  %cl

llvm-mca reports that the reciprocal throughput for the improved version is 1.5 or lower (getting as low as 1.3 on znver1; lower is better for reciprocal throughput) for various x86 architectures whereas the old code always exceeds 2.0, reaching 3.0 on core2.

@nagisa nagisa added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Aug 20, 2019
@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 20, 2019
@Centril Centril added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. E-help-wanted Call for participation: Help is requested to fix this issue. labels Aug 20, 2019
@bors bors closed this as completed in 30fd79c Aug 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-help-wanted Call for participation: Help is requested to fix this issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants