Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][LowerToLLVM] Lowered LLVM code for pointer arithmetic should have inbounds #1191

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

liusy58
Copy link

@liusy58 liusy58 commented Dec 2, 2024

Fix issue in #952.

@liusy58 liusy58 closed this Dec 2, 2024
@liusy58 liusy58 reopened this Dec 2, 2024
@liusy58 liusy58 closed this Dec 2, 2024
@liusy58 liusy58 reopened this Dec 2, 2024
Copy link
Collaborator

@seven-mile seven-mile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Thank you for your interest and contribution. Here are some suggestions on the overall direction:

  1. This change might not be suitable as an upstream modification to the LLVM dialect. If you believe it benefits all LLVM users, consider submitting this patch to the llvm-project monorepo.
  2. Alternatively, investigate what might be missing in CIR's responsibilities, comparing with original Clang CodeGen. For instance, in the original case described in issue CIR generated LLVM code for pointer arithmetic misses inbounds  #952, the cir.ptr_stride operation generates a GEP when lowering to LLVM. It might be more appropriate to address the issue there, perhaps by introducing common helpers for creating GEPs. (I don't have full context of the issue, for your reference only 😉)
  3. Please add a test for each part of your changes. Ideally, development should be driven by your test cases.

@Lancern
Copy link
Member

Lancern commented Dec 2, 2024

Thanks for your time working on this!

The current changes are not related to ClangIR and it's not an appropriate way to resolve #952 . Given the following input code:

void foo(int *iptr) { iptr + 2; }

The CIR generated for the above code would be something similar to:

// ...
%1 = cir.const 2 : i32
%2 = cir.ptr_stride(%0 : !cir.ptr<i32>, %1 : i32), !cir.ptr<i32>
// ...

The generated CIR is further lowered to the following LLVM dialect code in clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp:

%0 = llvm.getelementptr %1[%2] : (!llvm.ptr, i32) -> !llvm.ptr, i32

Apparently the root cause is that LowerToLLVM fails to add the inbounds attribute to the llvm.getelementptr operation. You should update code in LowerToLLVM.cpp accordingly to fix this problem.

@liusy58
Copy link
Author

liusy58 commented Dec 2, 2024

Alright, I'll work on LowerToLLVM to address this issue. In fact, I'm not entirely sure when the inbounds attribute should be added. I examined the code in Value *emitPointerArithmetic from clang/lib/CodeGen/CGExprScalar.cpp, but I couldn't identify a clear pattern. Could you please provide some guidance?

@Lancern
Copy link
Member

Lancern commented Dec 2, 2024

@liusy58 The getelementptr instruction is actually emitted in the CodeGenFunction::EmitCheckedInBoundsGEP function defined in the file you pointed out. It has two overloads, and you can find that both overloads set the inbounds flag without many prior conditions.

The C++ standard says that if the result of pointer arithmetic is out of bounds, the behavior is undefined. So I believe for cir.ptr_stride, you should always add the inbounds attribute to the lowered llvm.getelementptr operation.

@liusy58
Copy link
Author

liusy58 commented Dec 2, 2024

Thank you. Let me check it.

@bcardosolopes
Copy link
Member

Thanks @Lancern and @seven-mile for the great review and clarifications. @liusy58 welcome to the ClangIR project!

@liusy58
Copy link
Author

liusy58 commented Dec 3, 2024

@Lancern Hi, I have update the code and could you please review it?

Copy link
Member

@Lancern Lancern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! The CI shows you have 13 failed tests, please resolve them and it should be good to go!

@liusy58 liusy58 force-pushed the missing_inbounds branch 2 times, most recently from cec82cc to d9b7c41 Compare December 3, 2024 12:59
Copy link
Collaborator

@seven-mile seven-mile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update and bearing all the comments! Adding inbounds unconditionally might be considered not quite right.

I believe inbounds of GEP is about low-level pointer arithmetic rather than memory model in the language. The keyword controls the overflow behaviour concisely (ref), which leads to a common pattern in OG CodeGen:

if (CGF.getLangOpts().isSignedOverflowDefined())
return CGF.Builder.CreateGEP(elemTy, pointer, index, "add.ptr");
return CGF.EmitCheckedInBoundsGEP(
elemTy, pointer, index, isSigned, isSubtraction, op.E->getExprLoc(),
"add.ptr");

Additionally, we'd better be careful to apply language conformance: some options are designed to control the conformance or provide some extensions. The code above indicates an instance: -fwrapv controlling SOB. We should take care of them to keep the frontend functional ; )

There might be other considerations for a specific case in OG CodeGen. Usually the reliability comes from the correspondence of skeleton between the old and new codes. Given the fact that we have no choice but migrate these logic to LowerToLLVM, we should be especially cautious. IMHO this fix is not necessarily finished in one single patch.

For the next step, I think we can discuss what changes should this patch include. A good start is to just consider #952. If it's suitable, pack more changes in your following patches, and so on. It's your first-time contribution after all, no need to hurry 😉

} else {
auto loc = baseClassOp.getLoc();
mlir::Value isNull = rewriter.create<mlir::LLVM::ICmpOp>(
loc, mlir::LLVM::ICmpPredicate::eq, derivedAddr,
rewriter.create<mlir::LLVM::ZeroOp>(loc, derivedAddr.getType()));
mlir::Value adjusted = rewriter.create<mlir::LLVM::GEPOp>(
loc, resultType, byteType, derivedAddr, offset);
rewriter.replaceOpWithNewOp<mlir::LLVM::SelectOp>(baseClassOp, isNull,
derivedAddr, adjusted);
rewriter.replaceOpWithNewOp<mlir::LLVM::SelectOp>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change needed for SelectOp. Similar elsewhere.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you chose to fix all occurrences of GEP op under all conditions, which is far beyond #952 or the PR title ("GEP with a constant offset"). It would be very valuable to state the scope explicitly in PR description. Of course, feel free to continue our discussion and update the PR information later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, actually, there's no manual that clearly states when inbounds should be added. I can't find a clear pattern. :-( Plus, when I focus on #952 , some tests fail because there are different types of GEP. Haha...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if I just focus on #952 and adjust the tests that fail accordingly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liusy58 Let's keep the scope of changes limited. For now, let's just focus on the GEP instructions lowered from cir.ptr_stride only. If you find other GEP instructions suitable for the missing flag, you could propose those changes in future PRs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lancern Hi, please review again and check whether all tests will pass.

@liusy58 liusy58 changed the title GEP with a constant offset should have inbounds attribute. IR generated LLVM code for pointer arithmetic should have inbounds. Dec 4, 2024
@Lancern Lancern changed the title IR generated LLVM code for pointer arithmetic should have inbounds. [CIR][LowerToLLVM] Lowered LLVM code for pointer arithmetic should have inbounds Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants