Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcmp/memcmp removal optimization should remove unneeded allocas #52701

Open
scottmcm opened this issue Dec 14, 2021 · 3 comments
Open

bcmp/memcmp removal optimization should remove unneeded allocas #52701

scottmcm opened this issue Dec 14, 2021 · 3 comments

Comments

@scottmcm
Copy link

Example from rust-lang/rust#91838: https://godbolt.org/z/9h83ezxvj
Demonstration that more opt -O3 doesn't help: https://llvm.godbolt.org/z/qdanMeEar
Codegen repro via llc trunk: https://llvm.godbolt.org/z/oxMh6fEjq

It's excellent that short, known-length memcmp can just be mov+cmp in codegen.

But, unfortunately, if one of the sides of the comparison was passed directly (not via pointer), the alloca into which it was written to be able to call memcmp sticks around, resulting in generated assembly that writes the argument to stack then immediately reads it again:

demo_before:
        mov     dword ptr [rsp - 4], edx
        cmp     rsi, 4
        jne     .LBB0_1
        mov     eax, dword ptr [rdi]
        cmp     eax, dword ptr [rsp - 4]
        sete    al
        ret
.LBB0_1:
        xor     eax, eax
        ret

It would be nice if it could instead be

	cmp	rsi, 4
	jne	.LBB1_1
	cmp	dword ptr [rdi], edx
	sete	al
	ret

.LBB1_1:
	xor	eax, eax
	ret
@nikic
Copy link
Contributor

nikic commented Dec 15, 2021

The problem is that ExpandMemCmp (and MergeICmps as well, for that matter) only run as part of the backend pipeline, so there is little optimization happening after they run.

There was a previous attempt to move these into the end of the module pipeline, but these got reverted. I don't quite remember why that was. Maybe @legrosbuffle knows.

@scottmcm
Copy link
Author

scottmcm commented May 22, 2023

Now that https://llvm.org/docs/LangRef.html#llvm-memcpy-inline-intrinsic exists, could maybe this happen in the module pipeline instead for the .inline version, since it's never allowed to be a function call? And thus it seems plausible to turn it into instructions early in a way that I agree it might not for a general memcpy?

Brainfart, this issue is about memcmp, not memcpy, so there's no .inline version. Ignore me 🤦

@legrosbuffle
Copy link
Contributor

There was a previous attempt to move these into the end of the module pipeline. Maybe @legrosbuffle knows.

Sorry I missed this. Yes, that was https://reviews.llvm.org/D60318. Unfortunately that patch was interfering with sanitizers because the sanitizers are running after the pass and no longer see the memcmp, which prevent them from doing their interception work. There were also some compile-time regressions on some binaries because it's harder for LLVM to deal with a large number of loads than with memcmp.

Brainfart, this issue is about memcmp, not memcpy, so there's no .inline version. Ignore me facepalm

That being said at one point gchatelet@ was planning on addind memcmp.inline too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants