[CIR][LowerToLLVM] Lowered LLVM code for pointer arithmetic should have inbounds #1191

liusy58 · 2024-12-02T03:53:46Z

Fix issue in #952.

seven-mile

Hello! Thank you for your interest and contribution. Here are some suggestions on the overall direction:

This change might not be suitable as an upstream modification to the LLVM dialect. If you believe it benefits all LLVM users, consider submitting this patch to the llvm-project monorepo.
Alternatively, investigate what might be missing in CIR's responsibilities, comparing with original Clang CodeGen. For instance, in the original case described in issue CIR generated LLVM code for pointer arithmetic misses inbounds #952, the cir.ptr_stride operation generates a GEP when lowering to LLVM. It might be more appropriate to address the issue there, perhaps by introducing common helpers for creating GEPs. (I don't have full context of the issue, for your reference only 😉)
Please add a test for each part of your changes. Ideally, development should be driven by your test cases.

Lancern · 2024-12-02T05:58:29Z

Thanks for your time working on this!

The current changes are not related to ClangIR and it's not an appropriate way to resolve #952 . Given the following input code:

void foo(int *iptr) { iptr + 2; }

The CIR generated for the above code would be something similar to:

// ...
%1 = cir.const 2 : i32
%2 = cir.ptr_stride(%0 : !cir.ptr<i32>, %1 : i32), !cir.ptr<i32>
// ...

The generated CIR is further lowered to the following LLVM dialect code in clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp:

%0 = llvm.getelementptr %1[%2] : (!llvm.ptr, i32) -> !llvm.ptr, i32

Apparently the root cause is that LowerToLLVM fails to add the inbounds attribute to the llvm.getelementptr operation. You should update code in LowerToLLVM.cpp accordingly to fix this problem.

liusy58 · 2024-12-02T06:46:15Z

Alright, I'll work on LowerToLLVM to address this issue. In fact, I'm not entirely sure when the inbounds attribute should be added. I examined the code in Value *emitPointerArithmetic from clang/lib/CodeGen/CGExprScalar.cpp, but I couldn't identify a clear pattern. Could you please provide some guidance?

Lancern · 2024-12-02T09:36:58Z

@liusy58 The getelementptr instruction is actually emitted in the CodeGenFunction::EmitCheckedInBoundsGEP function defined in the file you pointed out. It has two overloads, and you can find that both overloads set the inbounds flag without many prior conditions.

The C++ standard says that if the result of pointer arithmetic is out of bounds, the behavior is undefined. So I believe for cir.ptr_stride, you should always add the inbounds attribute to the lowered llvm.getelementptr operation.

liusy58 · 2024-12-02T11:06:13Z

Thank you. Let me check it.

bcardosolopes · 2024-12-02T23:40:33Z

Thanks @Lancern and @seven-mile for the great review and clarifications. @liusy58 welcome to the ClangIR project!

liusy58 · 2024-12-03T09:36:18Z

@Lancern Hi, I have update the code and could you please review it?

Lancern

Thanks for working on this! The CI shows you have 13 failed tests, please resolve them and it should be good to go!

seven-mile

Thanks for the update and bearing all the comments! Adding inbounds unconditionally might be considered not quite right.

I believe inbounds of GEP is about low-level pointer arithmetic rather than memory model in the language. The keyword controls the overflow behaviour concisely (ref), which leads to a common pattern in OG CodeGen:

clangir/clang/lib/CodeGen/CGExprScalar.cpp

Lines 4090 to 4095 in eacaabb

    
           if (CGF.getLangOpts().isSignedOverflowDefined()) 
        
             return CGF.Builder.CreateGEP(elemTy, pointer, index, "add.ptr"); 
        
           return CGF.EmitCheckedInBoundsGEP( 
        
               elemTy, pointer, index, isSigned, isSubtraction, op.E->getExprLoc(), 
        
               "add.ptr");

Additionally, we'd better be careful to apply language conformance: some options are designed to control the conformance or provide some extensions. The code above indicates an instance: -fwrapv controlling SOB. We should take care of them to keep the frontend functional ; )

There might be other considerations for a specific case in OG CodeGen. Usually the reliability comes from the correspondence of skeleton between the old and new codes. Given the fact that we have no choice but migrate these logic to LowerToLLVM, we should be especially cautious. IMHO this fix is not necessarily finished in one single patch.

For the next step, I think we can discuss what changes should this patch include. A good start is to just consider #952. If it's suitable, pack more changes in your following patches, and so on. It's your first-time contribution after all, no need to hurry 😉

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

…tic should have inbounds.

liusy58 · 2024-12-04T05:38:58Z

Hi, @seven-mile , I have updated the commit, please review it. Thanks!

seven-mile · 2024-12-04T07:11:40Z

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

-      ptrStrideOp, resultTy, elementTy, adaptor.getBase(), index);
+  rewriter.replaceOpWithNewOp<mlir::LLVM::GEPOp>(ptrStrideOp, resultTy,
+                                                 elementTy, adaptor.getBase(),
+                                                 index, /*inbounds=*/true);


It's still unconditional. We cannot accept miscompilation. Please make sure we emit the same LLVM IR as e.g. this godbolt, or optionally lead it to an assersion failure.

ok, I will work on it later.

smeenai · 2024-12-05T00:38:06Z

#886 is a related potential area for follow-up work here if you're interested :)

liusy58 closed this Dec 2, 2024

liusy58 reopened this Dec 2, 2024

liusy58 closed this Dec 2, 2024

liusy58 reopened this Dec 2, 2024

seven-mile requested changes Dec 2, 2024

View reviewed changes

liusy58 requested review from lanza and bcardosolopes as code owners December 3, 2024 09:04

liusy58 force-pushed the missing_inbounds branch from 12458ba to 9c7e362 Compare December 3, 2024 09:18

liusy58 requested a review from seven-mile December 3, 2024 09:25

Lancern approved these changes Dec 3, 2024

View reviewed changes

liusy58 force-pushed the missing_inbounds branch 2 times, most recently from cec82cc to d9b7c41 Compare December 3, 2024 12:59

seven-mile requested changes Dec 3, 2024

View reviewed changes

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp Outdated Show resolved Hide resolved

clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp Show resolved Hide resolved

liusy58 force-pushed the missing_inbounds branch from d9b7c41 to 4e05834 Compare December 4, 2024 01:21

[CIR][LowerToLLVM] fixup! CIR generated LLVM code for pointer arithme…

0362c64

…tic should have inbounds.

liusy58 force-pushed the missing_inbounds branch from 17d944f to 0362c64 Compare December 4, 2024 03:23

liusy58 changed the title ~~GEP with a constant offset should have inbounds attribute.~~ IR generated LLVM code for pointer arithmetic should have inbounds. Dec 4, 2024

Lancern changed the title ~~IR generated LLVM code for pointer arithmetic should have inbounds.~~ [CIR][LowerToLLVM] Lowered LLVM code for pointer arithmetic should have inbounds Dec 4, 2024

liusy58 requested a review from seven-mile December 4, 2024 05:25

seven-mile reviewed Dec 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CIR][LowerToLLVM] Lowered LLVM code for pointer arithmetic should have inbounds #1191

[CIR][LowerToLLVM] Lowered LLVM code for pointer arithmetic should have inbounds #1191

liusy58 commented Dec 2, 2024

seven-mile left a comment

Lancern commented Dec 2, 2024

liusy58 commented Dec 2, 2024 •

edited

Loading

Lancern commented Dec 2, 2024 •

edited

Loading

liusy58 commented Dec 2, 2024

bcardosolopes commented Dec 2, 2024

liusy58 commented Dec 3, 2024

Lancern left a comment

seven-mile left a comment •

edited

Loading

liusy58 commented Dec 4, 2024

seven-mile Dec 4, 2024

liusy58 Dec 9, 2024

smeenai commented Dec 5, 2024

	if (CGF.getLangOpts().isSignedOverflowDefined())
	return CGF.Builder.CreateGEP(elemTy, pointer, index, "add.ptr");

	return CGF.EmitCheckedInBoundsGEP(
	elemTy, pointer, index, isSigned, isSubtraction, op.E->getExprLoc(),
	"add.ptr");

[CIR][LowerToLLVM] Lowered LLVM code for pointer arithmetic should have inbounds #1191

Are you sure you want to change the base?

[CIR][LowerToLLVM] Lowered LLVM code for pointer arithmetic should have inbounds #1191

Conversation

liusy58 commented Dec 2, 2024

seven-mile left a comment

Choose a reason for hiding this comment

Lancern commented Dec 2, 2024

liusy58 commented Dec 2, 2024 • edited Loading

Lancern commented Dec 2, 2024 • edited Loading

liusy58 commented Dec 2, 2024

bcardosolopes commented Dec 2, 2024

liusy58 commented Dec 3, 2024

Lancern left a comment

Choose a reason for hiding this comment

seven-mile left a comment • edited Loading

Choose a reason for hiding this comment

liusy58 commented Dec 4, 2024

seven-mile Dec 4, 2024

Choose a reason for hiding this comment

liusy58 Dec 9, 2024

Choose a reason for hiding this comment

smeenai commented Dec 5, 2024

liusy58 commented Dec 2, 2024 •

edited

Loading

Lancern commented Dec 2, 2024 •

edited

Loading

seven-mile left a comment •

edited

Loading