`bcmp`/`memcmp` removal optimization should remove unneeded `alloca`s #52701

scottmcm · 2021-12-14T21:46:23Z

Example from rust-lang/rust#91838: https://godbolt.org/z/9h83ezxvj
Demonstration that more opt -O3 doesn't help: https://llvm.godbolt.org/z/qdanMeEar
Codegen repro via llc trunk: https://llvm.godbolt.org/z/oxMh6fEjq

It's excellent that short, known-length memcmp can just be mov+cmp in codegen.

But, unfortunately, if one of the sides of the comparison was passed directly (not via pointer), the alloca into which it was written to be able to call memcmp sticks around, resulting in generated assembly that writes the argument to stack then immediately reads it again:

demo_before:
        mov     dword ptr [rsp - 4], edx
        cmp     rsi, 4
        jne     .LBB0_1
        mov     eax, dword ptr [rdi]
        cmp     eax, dword ptr [rsp - 4]
        sete    al
        ret
.LBB0_1:
        xor     eax, eax
        ret

It would be nice if it could instead be

	cmp	rsi, 4
	jne	.LBB1_1
	cmp	dword ptr [rdi], edx
	sete	al
	ret

.LBB1_1:
	xor	eax, eax
	ret

The text was updated successfully, but these errors were encountered:

nikic · 2021-12-15T11:41:58Z

The problem is that ExpandMemCmp (and MergeICmps as well, for that matter) only run as part of the backend pipeline, so there is little optimization happening after they run.

There was a previous attempt to move these into the end of the module pipeline, but these got reverted. I don't quite remember why that was. Maybe @legrosbuffle knows.

scottmcm · 2023-05-22T06:21:52Z

Now that https://llvm.org/docs/LangRef.html#llvm-memcpy-inline-intrinsic exists, could maybe this happen in the module pipeline instead for the .inline version, since it's never allowed to be a function call? And thus it seems plausible to turn it into instructions early in a way that I agree it might not for a general memcpy?

Brainfart, this issue is about memcmp, not memcpy, so there's no .inline version. Ignore me 🤦

legrosbuffle · 2023-05-23T06:28:10Z

There was a previous attempt to move these into the end of the module pipeline. Maybe @legrosbuffle knows.

Sorry I missed this. Yes, that was https://reviews.llvm.org/D60318. Unfortunately that patch was interfering with sanitizers because the sanitizers are running after the pass and no longer see the memcmp, which prevent them from doing their interception work. There were also some compile-time regressions on some binaries because it's harder for LLVM to deal with a large number of loads than with memcmp.

Brainfart, this issue is about memcmp, not memcpy, so there's no .inline version. Ignore me facepalm

That being said at one point gchatelet@ was planning on addind memcmp.inline too :)

github-actions bot added the new issue label Dec 14, 2021

scottmcm mentioned this issue Dec 14, 2021

Do array-slice equality via array equality, rather than always via slices rust-lang/rust#91838

Merged

EugeneZelenko added llvm:optimizations missed-optimization and removed new issue labels May 22, 2023

efriedma-quic mentioned this issue Jun 18, 2024

Failure to remove alloca #95987

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`bcmp`/`memcmp` removal optimization should remove unneeded `alloca`s #52701

`bcmp`/`memcmp` removal optimization should remove unneeded `alloca`s #52701

scottmcm commented Dec 14, 2021

nikic commented Dec 15, 2021

scottmcm commented May 22, 2023 •

edited

Loading

legrosbuffle commented May 23, 2023

bcmp/memcmp removal optimization should remove unneeded allocas #52701

bcmp/memcmp removal optimization should remove unneeded allocas #52701

Comments

scottmcm commented Dec 14, 2021

nikic commented Dec 15, 2021

scottmcm commented May 22, 2023 • edited Loading

legrosbuffle commented May 23, 2023

`bcmp`/`memcmp` removal optimization should remove unneeded `alloca`s #52701

`bcmp`/`memcmp` removal optimization should remove unneeded `alloca`s #52701

scottmcm commented May 22, 2023 •

edited

Loading