Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant branches with ctlz and cttz #47467

Open
Diggsey opened this issue Nov 9, 2020 · 3 comments
Open

Redundant branches with ctlz and cttz #47467

Diggsey opened this issue Nov 9, 2020 · 3 comments
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@Diggsey
Copy link

Diggsey commented Nov 9, 2020

Bugzilla Link 48123
Version trunk
OS Windows NT
CC @topperc,@RKSimon,@phoebewang,@rotateright

Extended Description

Rust code:

pub fn can_represent_as_f64(x: u64) -> bool {
    x.leading_zeros() + x.trailing_zeros() >= 11
}

LLVM IR:

define zeroext i1 @_ZN10playground20can_represent_as_f6417h8c9d47bab619cb5fE(i64 %x) {
start:
  %0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 false)
  %1 = trunc i64 %0 to i32
  %2 = tail call i64 @llvm.cttz.i64(i64 %x, i1 false)
  %3 = trunc i64 %2 to i32
  %_2 = add nuw nsw i32 %1, %3
  %4 = icmp ugt i32 %_2, 10
  ret i1 %4
}

Assembly:

playground::can_represent_as_f64:
	mov	eax, 64
	mov	ecx, 64
	test	rdi, rdi    ; Initial test for zero and branch
	je	.LBB0_2
	bsr	rcx, rdi
	xor	rcx, 63

.LBB0_2:
	test	rdi, rdi    ; Second test for zero and branch
	je	.LBB0_4
	bsf	rax, rdi

.LBB0_4:
	add	ecx, eax
	cmp	ecx, 10
	seta	al
	ret

Instead of performing the comparison twice, the code should immediately branch to LBB0_4.

@rotateright
Copy link
Contributor

rotateright commented Nov 11, 2020

We expand the intrinsics in -codegenprepare, and I'm not sure where we would solve this. machine-cse seems like the most likely candidate, but it would require tracking eflags state across basic blocks. Not sure if we do that:

    TEST64rr %5, %5, implicit-def $eflags
    JCC_1 %bb.2, 4, implicit $eflags

IR going into SDAG:

  define zeroext i1 @_ZN10playground20can_represent_as_f6417h8c9d47bab619cb5fE(i64 %x) unnamed_addr {
  start:
    %cmpz = icmp eq i64 %x, 0
    br i1 %cmpz, label %cond.end, label %cond.false
  
  cond.false:                                       ; preds = %start
    %0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
    br label %cond.end
  
  cond.end:                                         ; preds = %start, %cond.false
    %ctz = phi i64 [ 64, %start ], [ %0, %cond.false ]
    %1 = trunc i64 %ctz to i32
    %cmpz3 = icmp eq i64 %x, 0
    br i1 %cmpz3, label %cond.end2, label %cond.false1
  
  cond.false1:                                      ; preds = %cond.end
    %2 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
    br label %cond.end2
  
  cond.end2:                                        ; preds = %cond.end, %cond.false1
    %ctz4 = phi i64 [ 64, %cond.end ], [ %2, %cond.false1 ]
    %3 = trunc i64 %ctz4 to i32
    %_2 = add nuw nsw i32 %1, %3
    %4 = icmp ugt i32 %_2, 10
    ret i1 %4
  }

@rotateright
Copy link
Contributor

Note that this should not be an issue when compiling for more recent x86:

lzcntq	%rdi, %rax
tzcntq	%rdi, %rcx
addl	%eax, %ecx
cmpl	$10, %ecx
seta	%al
retq

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@RKSimon
Copy link
Collaborator

RKSimon commented Sep 3, 2024

After #102885 the branches are now avoided on any x86 target with cmov:

playground::can_represent_as_f64::h8c9d47bab619cb5f: # @playground::can_represent_as_f64::h8c9d47bab619cb5f
  bsrq %rdi, %rax
  movl $127, %ecx
  cmovneq %rax, %rcx
  xorl $63, %ecx
  bsfq %rdi, %rax
  movl $64, %edx
  cmovneq %rax, %rdx
  addl %ecx, %edx
  cmpl $11, %edx
  setae %al
  retq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants