-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for builtin_verbose_trap #79230
Conversation
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-clang Author: Akira Hatanaka (ahatanak) ChangesThe builtin causes the program to stop its execution abnormally and shows a human-readable description of the reason for the termination when a debugger is attached or in a symbolicated crash log. The motivation for the builtin is explained in the following RFC: https://discourse.llvm.org/t/rfc-adding-builtin-verbose-trap-string-literal/75845 Full diff: https://github.com/llvm/llvm-project/pull/79230.diff 12 Files Affected:
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index c1420079f75118d..4526bc2df53e422 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -3379,6 +3379,54 @@ Query for this feature with ``__has_builtin(__builtin_debugtrap)``.
Query for this feature with ``__has_builtin(__builtin_trap)``.
+``__builtin_verbose_trap``
+------------------
+
+``__builtin_verbose_trap`` causes the program to stop its execution abnormally
+and shows a human-readable description of the reason for the termination when a
+debugger is attached or in a symbolicated crash log.
+
+**Syntax**:
+
+.. code-block:: c++
+
+ __builtin_verbose_trap(const char *reason)
+
+**Description**
+
+``__builtin_verbose_trap`` is lowered to the ` ``llvm.trap`` <https://llvm.org/docs/LangRef.html#llvm-trap-intrinsic>`_ builtin.
+Additionally, clang emits debug metadata that represents an artificial inline
+frame whose name encodes the string passed to the builtin, prefixed by a "magic"
+prefix.
+
+For example, consider the following code:
+
+.. code-block:: c++
+
+ void foo(int* p) {
+ if (p == nullptr)
+ __builtin_verbose_trap("Argument_must_not_be_null");
+ }
+
+The debug metadata would look as if it were produced for the following code:
+
+.. code-block:: c++
+
+ __attribute__((always_inline))
+ inline void "__llvm_verbose_trap: Argument_must_not_be_null"() {
+ __builtin_trap();
+ }
+
+ void foo(int* p) {
+ if (p == nullptr)
+ "__llvm_verbose_trap: Argument_must_not_be_null"();
+ }
+
+However, the LLVM IR would not actually contain a call to the artificial
+function — it only exists in the debug metadata.
+
+Query for this feature with ``__has_builtin(__builtin_verbose_trap)``.
+
``__builtin_nondeterministic_value``
------------------------------------
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 6e31849ce16dd40..6da216bbd19250a 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -268,6 +268,8 @@ Non-comprehensive list of changes in this release
and vector types as return value ``19``, to match GCC 14's behavior.
* The default value of `_MSC_VER` was raised from 1920 to 1933.
* Since MSVC 19.33 added undocumented attribute ``[[msvc::constexpr]]``, this release adds the attribute as well.
+* Support for ``__builtin_verbose_trap`` has been added. See
+ https://clang.llvm.org/docs/LanguageExtensions.html#builtin-functions.
* Added ``#pragma clang fp reciprocal``.
diff --git a/clang/include/clang/AST/Expr.h b/clang/include/clang/AST/Expr.h
index a41f2d66b37b69d..34d913748d3021f 100644
--- a/clang/include/clang/AST/Expr.h
+++ b/clang/include/clang/AST/Expr.h
@@ -774,6 +774,11 @@ class Expr : public ValueStmt {
const Expr *PtrExpression, ASTContext &Ctx,
EvalResult &Status) const;
+ /// If the current Expr can be evaluated to a pointer to a null-terminated
+ /// constant string, return the constant string (without the terminating null)
+ /// in Result. Return true if it succeeds.
+ bool tryEvaluateString(std::string &Result, ASTContext &Ctx) const;
+
/// Enumeration used to describe the kind of Null pointer constant
/// returned from \c isNullPointerConstant().
enum NullPointerConstantKind {
diff --git a/clang/include/clang/Basic/Builtins.def b/clang/include/clang/Basic/Builtins.def
index 4dcbaf8a7beaa65..e9bea088941987f 100644
--- a/clang/include/clang/Basic/Builtins.def
+++ b/clang/include/clang/Basic/Builtins.def
@@ -673,6 +673,7 @@ BUILTIN(__builtin_expect_with_probability, "LiLiLid", "ncE")
BUILTIN(__builtin_prefetch, "vvC*.", "nc")
BUILTIN(__builtin_readcyclecounter, "ULLi", "n")
BUILTIN(__builtin_trap, "v", "nr")
+BUILTIN(__builtin_verbose_trap, "vcC*", "nr")
BUILTIN(__builtin_debugtrap, "v", "n")
BUILTIN(__builtin_unreachable, "v", "nr")
BUILTIN(__builtin_shufflevector, "v." , "nct")
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index c3213f7ccb8c88e..bc01c7876e52012 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -8764,6 +8764,8 @@ def err_expected_callable_argument : Error<
"expected a callable expression as %ordinal0 argument to %1, found %2">;
def note_building_builtin_dump_struct_call : Note<
"in call to printing function with arguments '(%0)' while dumping struct">;
+def err_builtin_verbose_trap_arg : Error<
+ "argument to __builtin_verbose_trap must be a non-empty string literal">;
def err_atomic_load_store_uses_lib : Error<
"atomic %select{load|store}0 requires runtime support that is not "
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index f20850d14c0c860..0519c604b63a5ad 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -1880,7 +1880,8 @@ static bool EvaluateAtomic(const Expr *E, const LValue *This, APValue &Result,
EvalInfo &Info);
static bool EvaluateAsRValue(EvalInfo &Info, const Expr *E, APValue &Result);
static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
- EvalInfo &Info);
+ EvalInfo &Info,
+ std::string *StringResult = nullptr);
/// Evaluate an integer or fixed point expression into an APResult.
static bool EvaluateFixedPointOrInteger(const Expr *E, APFixedPoint &Result,
@@ -16612,7 +16613,7 @@ bool Expr::tryEvaluateObjectSize(uint64_t &Result, ASTContext &Ctx,
}
static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
- EvalInfo &Info) {
+ EvalInfo &Info, std::string *StringResult) {
if (!E->getType()->hasPointerRepresentation() || !E->isPRValue())
return false;
@@ -16639,6 +16640,8 @@ static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
Str = Str.substr(0, Pos);
Result = Str.size();
+ if (StringResult)
+ *StringResult = Str;
return true;
}
@@ -16654,12 +16657,21 @@ static bool EvaluateBuiltinStrLen(const Expr *E, uint64_t &Result,
if (!Char.getInt()) {
Result = Strlen;
return true;
- }
+ } else if (StringResult)
+ StringResult->push_back(Char.getInt().getExtValue());
if (!HandleLValueArrayAdjustment(Info, E, String, CharTy, 1))
return false;
}
}
+bool Expr::tryEvaluateString(std::string &StringResult, ASTContext &Ctx) const {
+ Expr::EvalStatus Status;
+ EvalInfo Info(Ctx, Status, EvalInfo::EM_ConstantFold);
+ uint64_t Result;
+
+ return EvaluateBuiltinStrLen(this, Result, Info, &StringResult);
+}
+
bool Expr::EvaluateCharRangeAsString(std::string &Result,
const Expr *SizeExpression,
const Expr *PtrExpression, ASTContext &Ctx,
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 1ed35befe1361f4..27401e6c00647a8 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -3212,6 +3212,18 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
case Builtin::BI__builtin_trap:
EmitTrapCall(Intrinsic::trap);
return RValue::get(nullptr);
+ case Builtin::BI__builtin_verbose_trap: {
+ llvm::DILocation *TrapLocation = Builder.getCurrentDebugLocation();
+ if (getDebugInfo()) {
+ std::string Str;
+ E->getArg(0)->tryEvaluateString(Str, getContext());
+ TrapLocation = getDebugInfo()->CreateTrapFailureMessageFor(
+ TrapLocation, "__llvm_verbose_trap", Str);
+ }
+ ApplyDebugLocation ApplyTrapDI(*this, TrapLocation);
+ EmitTrapCall(Intrinsic::trap);
+ return RValue::get(nullptr);
+ }
case Builtin::BI__debugbreak:
EmitTrapCall(Intrinsic::debugtrap);
return RValue::get(nullptr);
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 236d53bee4e8f18..9b939ce560e7bba 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -1628,6 +1628,27 @@ llvm::DIType *CGDebugInfo::createFieldType(
offsetInBits, flags, debugType, Annotations);
}
+llvm::DISubprogram *
+CGDebugInfo::getFakeFuncSubprogram(const std::string &FakeFuncName) {
+ llvm::DISubprogram *&SP = FakeFuncMap[FakeFuncName];
+
+ if (!SP) {
+ auto FileScope = TheCU->getFile();
+ llvm::DISubroutineType *DIFnTy = DBuilder.createSubroutineType(nullptr);
+ // Note: We use `FileScope` rather than `TheCU` as the scope because that's
+ // what LLVM's inliner seems to do.
+ SP = DBuilder.createFunction(
+ /*Scope=*/FileScope, /*Name=*/FakeFuncName, /*LinkageName=*/StringRef(),
+ /*File=*/FileScope, /*LineNo=*/0, /*Ty=*/DIFnTy,
+ /*ScopeLine=*/0,
+ /*Flags=*/llvm::DINode::FlagArtificial,
+ /*SPFlags=*/llvm::DISubprogram::SPFlagDefinition,
+ /*TParams=*/nullptr, /*ThrownTypes=*/nullptr, /*Annotations=*/nullptr);
+ }
+
+ return SP;
+}
+
void CGDebugInfo::CollectRecordLambdaFields(
const CXXRecordDecl *CXXDecl, SmallVectorImpl<llvm::Metadata *> &elements,
llvm::DIType *RecordTy) {
@@ -3416,6 +3437,27 @@ llvm::DIMacroFile *CGDebugInfo::CreateTempMacroFile(llvm::DIMacroFile *Parent,
return DBuilder.createTempMacroFile(Parent, Line, FName);
}
+llvm::DILocation *CGDebugInfo::CreateTrapFailureMessageFor(
+ llvm::DebugLoc TrapLocation, StringRef Prefix, StringRef FailureMsg) {
+ // Create debug info that describes a fake function whose name is the failure
+ // message.
+ std::string FuncName(Prefix);
+ if (!FailureMsg.empty()) {
+ // A space in the function name identifies this as not being a real function
+ // because it's not a valid symbol name.
+ FuncName += ": ";
+ FuncName += FailureMsg;
+ }
+
+ assert(FuncName.size() > 0);
+ assert(FuncName.find(' ') != std::string::npos &&
+ "Fake function name must contain a space");
+
+ llvm::DISubprogram *TrapSP = getFakeFuncSubprogram(FuncName);
+ return llvm::DILocation::get(CGM.getLLVMContext(), /*Line=*/0, /*Column=*/0,
+ /*Scope=*/TrapSP, /*InlinedAt=*/TrapLocation);
+}
+
static QualType UnwrapTypeForDebugInfo(QualType T, const ASTContext &C) {
Qualifiers Quals;
do {
diff --git a/clang/lib/CodeGen/CGDebugInfo.h b/clang/lib/CodeGen/CGDebugInfo.h
index 7b60e94555d0608..21edb5536742600 100644
--- a/clang/lib/CodeGen/CGDebugInfo.h
+++ b/clang/lib/CodeGen/CGDebugInfo.h
@@ -346,6 +346,14 @@ class CGDebugInfo {
const FieldDecl *BitFieldDecl, const llvm::DIDerivedType *BitFieldDI,
llvm::ArrayRef<llvm::Metadata *> PreviousFieldsDI, const RecordDecl *RD);
+ // A cache that maps fake function names used for __builtin_verbose_trap to
+ // subprograms.
+ std::map<std::string, llvm::DISubprogram *> FakeFuncMap;
+
+ // A function that returns the subprogram corresponding to the fake function
+ // name.
+ llvm::DISubprogram *getFakeFuncSubprogram(const std::string &FakeFuncName);
+
/// Helpers for collecting fields of a record.
/// @{
void CollectRecordLambdaFields(const CXXRecordDecl *CXXDecl,
@@ -602,6 +610,18 @@ class CGDebugInfo {
return CoroutineParameterMappings;
}
+ // Create a debug location from `TrapLocation` that adds a fake
+ // inline frame where the frame name is
+ //
+ // * `<Prefix>: <FailureMsg>` if `<FailureMsg>` is not empty.
+ // * `<Prefix>` if `<FailureMsg>` is empty. Note `<Prefix>` must
+ // contain a space.
+ //
+ // This is used to store failure reasons for traps.
+ llvm::DILocation *CreateTrapFailureMessageFor(llvm::DebugLoc TrapLocation,
+ StringRef Prefix,
+ StringRef FailureMsg);
+
private:
/// Emit call to llvm.dbg.declare for a variable declaration.
/// Returns a pointer to the DILocalVariable associated with the
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index ace3e386988f005..2a340236c73e8ed 100644
--- a/clang/lib/Sema/SemaChecking.cpp
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -171,6 +171,23 @@ static bool checkArgCount(Sema &S, CallExpr *Call, unsigned DesiredArgCount) {
<< /*is non object*/ 0 << Call->getArg(1)->getSourceRange();
}
+static bool checkBuiltinVerboseTrap(CallExpr *Call, Sema &S) {
+ Expr *Arg = Call->getArg(0);
+
+ if (Arg->isValueDependent())
+ return true;
+
+ // FIXME: Add more checks and reject strings that can't be handled by
+ // debuggers.
+ std::string Result;
+ if (!Arg->tryEvaluateString(Result, S.Context) || Result.empty()) {
+ S.Diag(Arg->getBeginLoc(), diag::err_builtin_verbose_trap_arg)
+ << Arg->getSourceRange();
+ return false;
+ }
+ return true;
+}
+
static bool convertArgumentToType(Sema &S, Expr *&Value, QualType Ty) {
if (Value->isTypeDependent())
return false;
@@ -2881,6 +2898,11 @@ Sema::CheckBuiltinFunctionCall(FunctionDecl *FDecl, unsigned BuiltinID,
case Builtin::BI__builtin_matrix_column_major_store:
return SemaBuiltinMatrixColumnMajorStore(TheCall, TheCallResult);
+ case Builtin::BI__builtin_verbose_trap:
+ if (!checkBuiltinVerboseTrap(TheCall, *this))
+ return ExprError();
+ break;
+
case Builtin::BI__builtin_get_device_side_mangled_name: {
auto Check = [](CallExpr *TheCall) {
if (TheCall->getNumArgs() != 1)
diff --git a/clang/test/CodeGenCXX/debug-info-verbose-trap.cpp b/clang/test/CodeGenCXX/debug-info-verbose-trap.cpp
new file mode 100644
index 000000000000000..3433a08312f274a
--- /dev/null
+++ b/clang/test/CodeGenCXX/debug-info-verbose-trap.cpp
@@ -0,0 +1,49 @@
+// RUN: %clang_cc1 -std=c++20 -emit-llvm -debug-info-kind=limited %s -o - | FileCheck %s
+
+// CHECK-LABEL: define void @_Z2f0v()
+// CHECK: call void @llvm.trap(), !dbg ![[LOC17:.*]]
+
+// CHECK-LABEL: define void @_Z2f1v()
+// CHECK: call void @llvm.trap(), !dbg ![[LOC23:.*]]
+// CHECK: call void @llvm.trap(), !dbg ![[LOC25:.*]]
+
+// CHECK-LABEL: define void @_Z2f3v()
+// CHECK: call void @_Z2f2IXadsoKcL_ZL8constMsgEEEEvv()
+
+// CHECK-LABEL: define internal void @_Z2f2IXadsoKcL_ZL8constMsgEEEEvv()
+// CHECK: call void @llvm.trap(), !dbg ![[LOC36:.*]]
+
+// CHECK: ![[FILESCOPE:.*]] = !DIFile(filename:
+// CHECK: ![[SUBPROG14:.*]] = distinct !DISubprogram(name: "f0", linkageName: "_Z2f0v",
+// CHECK: ![[LOC17]] = !DILocation(line: 0, scope: ![[SUBPROG18:.*]], inlinedAt: ![[LOC20:.*]])
+// CHECK: ![[SUBPROG18]] = distinct !DISubprogram(name: "__llvm_verbose_trap: Argument_must_not_be_null", scope: ![[FILESCOPE]], file: ![[FILESCOPE]], type: !{{.*}}, flags: DIFlagArtificial, spFlags: DISPFlagDefinition, unit: !{{.*}})
+// CHECK: ![[LOC20]] = !DILocation(line: 34, column: 3, scope: ![[SUBPROG14]])
+// CHECK: ![[SUBPROG22:.*]] = distinct !DISubprogram(name: "f1", linkageName: "_Z2f1v",
+// CHECK: ![[LOC23]] = !DILocation(line: 0, scope: ![[SUBPROG18]], inlinedAt: ![[LOC24:.*]])
+// CHECK: ![[LOC24]] = !DILocation(line: 38, column: 3, scope: ![[SUBPROG22]])
+// CHECK: ![[LOC25]] = !DILocation(line: 0, scope: ![[SUBPROG26:.*]], inlinedAt: ![[LOC27:.*]])
+// CHECK: ![[SUBPROG26]] = distinct !DISubprogram(name: "__llvm_verbose_trap: hello", scope: ![[FILESCOPE]], file: ![[FILESCOPE]], type: !{{.*}}, flags: DIFlagArtificial, spFlags: DISPFlagDefinition, unit: !{{.*}})
+// CHECK: ![[LOC27]] = !DILocation(line: 39, column: 3, scope: ![[SUBPROG22]])
+// CHECK: ![[SUBPROG32:.*]] = distinct !DISubprogram(name: "f2<constMsg>", linkageName: "_Z2f2IXadsoKcL_ZL8constMsgEEEEvv",
+// CHECK: ![[LOC36]] = !DILocation(line: 0, scope: ![[SUBPROG26]], inlinedAt: ![[LOC37:.*]])
+// CHECK: ![[LOC37]] = !DILocation(line: 44, column: 3, scope: ![[SUBPROG32]])
+
+char const constMsg[] = "hello";
+
+void f0() {
+ __builtin_verbose_trap("Argument_must_not_be_null");
+}
+
+void f1() {
+ __builtin_verbose_trap("Argument_must_not_be_null");
+ __builtin_verbose_trap("hello");
+}
+
+template <const char * const str>
+void f2() {
+ __builtin_verbose_trap(str);
+}
+
+void f3() {
+ f2<constMsg>();
+}
diff --git a/clang/test/SemaCXX/verbose-trap.cpp b/clang/test/SemaCXX/verbose-trap.cpp
new file mode 100644
index 000000000000000..7c152325d30815e
--- /dev/null
+++ b/clang/test/SemaCXX/verbose-trap.cpp
@@ -0,0 +1,28 @@
+// RUN: %clang_cc1 -std=c++11 -fsyntax-only -fcxx-exceptions -verify %s
+
+#if !__has_builtin(__builtin_verbose_trap)
+#error
+#endif
+
+constexpr char const* constMsg1 = "hello";
+char const* const constMsg2 = "hello";
+char const constMsg3[] = "hello";
+
+template <const char * const str>
+void f(const char * arg) {
+ __builtin_verbose_trap("Argument_must_not_be_null");
+ __builtin_verbose_trap("hello" "world");
+ __builtin_verbose_trap(constMsg1);
+ __builtin_verbose_trap(constMsg2);
+ __builtin_verbose_trap(""); // expected-error {{argument to __builtin_verbose_trap must be a non-empty string literal}}
+ __builtin_verbose_trap(); // expected-error {{too few arguments}}
+ __builtin_verbose_trap(0); // expected-error {{argument to __builtin_verbose_trap must be a non-empty string literal}}
+ __builtin_verbose_trap(1); // expected-error {{cannot initialize a parameter of type 'const char *' with}}
+ __builtin_verbose_trap(arg); // expected-error {{argument to __builtin_verbose_trap must be a non-empty string literal}}
+ __builtin_verbose_trap(str);
+ __builtin_verbose_trap(u8"hello");
+}
+
+void test() {
+ f<constMsg3>(nullptr);
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cosmetic suggestion I have would be to replace all instances of Fake
with inlinedTrap
to make the purpose clearer.
clang/lib/CodeGen/CGDebugInfo.cpp
Outdated
@@ -1628,6 +1628,27 @@ llvm::DIType *CGDebugInfo::createFieldType( | |||
offsetInBits, flags, debugType, Annotations); | |||
} | |||
|
|||
llvm::DISubprogram * | |||
CGDebugInfo::getFakeFuncSubprogram(const std::string &FakeFuncName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this createInlinedTrapSubprogram
?
clang/lib/CodeGen/CGDebugInfo.cpp
Outdated
@@ -3416,6 +3437,27 @@ llvm::DIMacroFile *CGDebugInfo::CreateTempMacroFile(llvm::DIMacroFile *Parent, | |||
return DBuilder.createTempMacroFile(Parent, Line, FName); | |||
} | |||
|
|||
llvm::DILocation *CGDebugInfo::CreateTrapFailureMessageFor( | |||
llvm::DebugLoc TrapLocation, StringRef Prefix, StringRef FailureMsg) { | |||
// Create debug info that describes a fake function whose name is the failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Create debug info that describes a fake function whose name is the failure | |
// Create debug info that describes an inlined function whose name is the error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat idea (I imagine some debuggers will trip over the space in the inlined name, but seems pretty good)
I'd guess sometimes we'll lose the inlined debug info if two of these traps are merged, fwiw - but it's probably relatively robust, apart from that.
#if !__has_builtin(__builtin_verbose_trap) | ||
#error | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just makes sure the builtin is supported unconditionally and doesn't depend on some features or anything (cc @ldionne ).
For context, we've been using this same trick in the Swift standard library for a long time now, to get a good trade-off between space-efficient code generation and usability for inlineable functions such as the arithmetic |
For spaces specifically there's a multi-decade precedent: Objective-C methods have names like |
clang/lib/CodeGen/CGDebugInfo.cpp
Outdated
} | ||
|
||
assert(FuncName.size() > 0); | ||
assert(FuncName.find(' ') != std::string::npos && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pre condition (as stated in the function documentation) is that Prefix
should have a space, but here we are checking FuncName
instead.
We add a space to FuncName
whenever FailureMsg
is not empty, so this assert is not catching a pre condition violation in all cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it means Prefix
is required to have a space only when FailureMsg
is empty. If FailureMsg
is empty, FuncName
is the same as Prefix
, so the checks seem correct to me. @delcypher is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can better express this intent by moving the assert before the if
and by rewriting it as the faster assert(!FailureMessage.empty() || Prefix.find(' ') != npos)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should be fine. We should also make sure someone doesn't accidentally remove the space in FuncName
inside the if
, but I guess that's okay since there's a comment there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it means Prefix is required to have a space only when FailureMsg is empty. If FailureMsg is empty, FuncName is the same as Prefix, so the checks seem correct to me. @delcypher is that correct?
The actually intent here was to ensure the artificial function has a space in it. That's it, nothing more complicated than that. The check is done on the final function name and not earlier so the assert doesn't need to care about the prefix.
The origin of requiring a space was based on an off-hand comment (I think) @adrian-prantl made a while back to me that having a space in the function name (in C at least) would not be a valid function identifier (and therefore wouldn't conflict with any existing function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the right answer here is:
We guarantee not to collide with other symbols by use the reserved __
prefix. Any consumer that wants to identify such a symbol checks for the __llvm_builtin_trap
prefix to unambiguously identify these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The space is not needed for correctness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should say that this works only when debug information is enabled.
clang/docs/LanguageExtensions.rst
Outdated
**Description** | ||
|
||
``__builtin_verbose_trap`` is lowered to the ` ``llvm.trap`` <https://llvm.org/docs/LangRef.html#llvm-trap-intrinsic>`_ builtin. | ||
Additionally, clang emits debug metadata that represents an artificial inline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"metadata" is a clang internal term, of no real meaning to end users, and this page is intended for end users. I think "debugging information" would be better.
clang/docs/LanguageExtensions.rst
Outdated
__builtin_verbose_trap("Argument_must_not_be_null"); | ||
} | ||
|
||
The debug metadata would look as if it were produced for the following code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"debugging information"
clang/docs/LanguageExtensions.rst
Outdated
} | ||
|
||
However, the LLVM IR would not actually contain a call to the artificial | ||
function — it only exists in the debug metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the generated code would not ... in the debugging information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahatanak Any follow up?
clang/docs/LanguageExtensions.rst
Outdated
|
||
void foo(int* p) { | ||
if (p == nullptr) | ||
__builtin_verbose_trap("Argument_must_not_be_null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__builtin_verbose_trap("Argument_must_not_be_null"); | |
__builtin_verbose_trap("Argument must not be null!"); |
To make it clear that the builtin accepts arbitrary string literals, not just valid identifiers.
clang/test/SemaCXX/verbose-trap.cpp
Outdated
|
||
template <const char * const str> | ||
void f(const char * arg) { | ||
__builtin_verbose_trap("Argument_must_not_be_null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__builtin_verbose_trap("Argument_must_not_be_null"); | |
__builtin_verbose_trap("Arbitrary string literals can be used!"); | |
__builtin_verbose_trap("Argument_must_not_be_null"); |
I think this would require encoding the message, but it's really important to be able to pass arbitrary string literals here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Can we also check:
- A non-ASCII Unicode character?
- A long string (is there any limitation on the string length?)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes, it can handle something like
__builtin_verbose_trap(u8"¡¢£¤¥¦§¨©ª");
. - How long of a string do you need to test? I don't think there's a limitation that's specific to this builtin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, okay, then I don't understand how this works under the hood, but I'm happy :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the code doesn't compile in c++20
because the type of UTF8 string literals is char8_t[N]
instead of char[N]
. This is the error message:
cannot initialize a parameter of type 'const char *' with an lvalue of type 'const char8_t[21]'
It compiles if I cast it to the correct type (__builtin_verbose_trap((const char *)u8"¡¢£¤¥¦§¨©ª");
).
How important is it to be able to pass UTF8 string literals to the builtin in c++20 without using any casts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philnik777 you are right. I'll enable unicode in follow-up patches using custom type checking.
@var-const 128 and 256 character strings should work without any problems. I don't think you'll see any problems with long strings unless you hit implementation limits of clang or lldb. I tested a 512 char string literal locally and both clang and lldb work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imposing this length limit seems pretty arbitrary. What's the actually restriction on the debug info/LLDB side? If there isn't one I don't think we should be adding an artificial limit without a good reason for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch doesn't add any artificial limit on the string length, but I meant that there's probably a limit on the length of string literals the compiler can handle, which is determined by the implementation. I don't know the exact limit, but it's a lot larger than 256 and 512.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the compiler isn't imposing an artificial limit then that seems fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I've seen multiple MB symbol names :) so I probably wouldn't be too worried that any user would write a description that's longer than the symbols that complex templates can create...
437441a
to
3e54bf3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the libc++ perspective, the only substantial comment I have is that ideally we need to make sure it's possible to pass long strings with arbitrary characters as the string parameter (or, if it's infeasible to implement, we need to document the limitations).
clang/test/SemaCXX/verbose-trap.cpp
Outdated
|
||
template <const char * const str> | ||
void f(const char * arg) { | ||
__builtin_verbose_trap("Argument_must_not_be_null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Can we also check:
- A non-ASCII Unicode character?
- A long string (is there any limitation on the string length?)?
clang/lib/CodeGen/CGDebugInfo.h
Outdated
@@ -346,6 +346,14 @@ class CGDebugInfo { | |||
const FieldDecl *BitFieldDecl, const llvm::DIDerivedType *BitFieldDI, | |||
llvm::ArrayRef<llvm::Metadata *> PreviousFieldsDI, const RecordDecl *RD); | |||
|
|||
// A cache that maps artificial inlined function names used for | |||
// __builtin_verbose_trap to subprograms. | |||
std::map<std::string, llvm::DISubprogram *> InlinedTrapFuncMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think this header might need to include <map>
and <string>
, otherwise it appears to rely on those headers being included transitively from somewhere. (Unless of course there's some Clang convention against doing that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
& consider using llvm::StringMap
, perhaps?
clang/test/SemaCXX/verbose-trap.cpp
Outdated
__builtin_verbose_trap("hello" "world"); | ||
__builtin_verbose_trap(constMsg1); | ||
__builtin_verbose_trap(constMsg2); | ||
__builtin_verbose_trap(""); // expected-error {{argument to __builtin_verbose_trap must be a pointer to a non-empty constant string}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to make that (an empty string literal) work? It seems like a bit of an arbitrary limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from the libc++ perspective, thanks a lot for working on this!
clang/lib/CodeGen/CGDebugInfo.h
Outdated
@@ -346,6 +346,14 @@ class CGDebugInfo { | |||
const FieldDecl *BitFieldDecl, const llvm::DIDerivedType *BitFieldDI, | |||
llvm::ArrayRef<llvm::Metadata *> PreviousFieldsDI, const RecordDecl *RD); | |||
|
|||
// A cache that maps artificial inlined function names used for | |||
// __builtin_verbose_trap to subprograms. | |||
std::map<std::string, llvm::DISubprogram *> InlinedTrapFuncMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
& consider using llvm::StringMap
, perhaps?
@@ -3424,6 +3445,26 @@ llvm::DIMacroFile *CGDebugInfo::CreateTempMacroFile(llvm::DIMacroFile *Parent, | |||
return DBuilder.createTempMacroFile(Parent, Line, FName); | |||
} | |||
|
|||
llvm::DILocation *CGDebugInfo::CreateTrapFailureMessageFor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's currently only one caller with a fixed parameter for Prefix
- perhaps that could be hardcoded in the implementation instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwblaikie We currently use this function for -fbounds-safety
internally. We are currently in the process of upstreaming this which is why the function should not hardcode the Prefix
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwblaikie what do you think? -fbounds-safety
is currently being upstreamed and there is going to be another caller of the function in the not-too-distant future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwblaikie any comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with https://github.com/llvm/llvm-project/pull/79230/files#r1529429173 then - the prefix should be a parameter and I don't think it needs to be a #defined constant (CLANG_VERBOSE_TRAP_PREFIX) - and can instead be written as a literal in the caller in CGBuiltin.cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll make it a parameter then.
I added macro CLANG_VERBOSE_TRAP_PREFIX
to ModuleBuilder.h
so that lldb wouldn't have to define it too. See #80368 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwblaikie 's comment:
#79230 (comment)
So the name of the subprogram would be something like __builtin_trap: identifier123 : Argument_must_not_be_null
?
@Michael137 @delcypher is there a reason we cannot use the same prefix (__builtin_trap
, for example)? Does lldb need clang to use different prefixes for -fbounds-safety
and builtin_verbose_trap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the patch to pass a category string to the builtin. The artificial function name has the format <prefix>:<category>:<message>
where <prefix>
is always __builtin_verbose_trap
.
We should probably use something other than :
for the separator as users might want to use it in the category or message string (e.g., boost::some_library_name
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dwblaikie any other comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I'm not too fussed about the separator for the prefix (we control that) and category (will the category be user-visible? Like a warning group? Or is that only still an implementation detail/contract between DWARF producer and DWARF consumer? I thought it was more the latter/implementation detail? In which case we can say what characters are valid there, and I'd keep it pretty simple, like the warning group names - all lower, dash or underscore separated seems fine)
clang/lib/CodeGen/CGDebugInfo.cpp
Outdated
SmallString<64> FuncName(Prefix); | ||
if (!FailureMsg.empty()) { | ||
// A space in the function name identifies this as not being a real function | ||
// because it's not a valid symbol name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment is true (or necessary). Objective-C methods have spaces in them, for example. I would just remove the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrian-prantl Sorry. This comment is based on an off-hand comment you made a while back in the context of -fbounds-safety
(which only supports C) where I think you said having a space in the function name would distinguish the artificial function from any real function. While that's true for C I guess it's not true for other languages.
clang/lib/CodeGen/CGDebugInfo.h
Outdated
// | ||
// Currently `<Prefix>` is always "__llvm_verbose_trap". | ||
// | ||
// This is used to store failure reasons for traps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
///
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahatanak Any follow up?
clang/lib/CodeGen/CGDebugInfo.h
Outdated
llvm::StringMap<llvm::DISubprogram *> InlinedTrapFuncMap; | ||
|
||
// A function that returns the subprogram corresponding to the artificial | ||
// inlined function for traps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
///
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahatanak Any follow up?
clang/lib/CodeGen/CGDebugInfo.h
Outdated
@@ -346,6 +348,15 @@ class CGDebugInfo { | |||
const FieldDecl *BitFieldDecl, const llvm::DIDerivedType *BitFieldDI, | |||
llvm::ArrayRef<llvm::Metadata *> PreviousFieldsDI, const RecordDecl *RD); | |||
|
|||
// A cache that maps artificial inlined function names used for | |||
// __builtin_verbose_trap to subprograms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
///
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahatanak Any follow up?
2c52acd
to
4896af0
Compare
I'll merge this in a day or two if there are no objections. |
@@ -27,6 +27,9 @@ namespace llvm { | |||
} | |||
} | |||
|
|||
// Prefix for __builtin_verbose_trap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahatanak Does this comment need updating? The prefix isn’t just for __builtin_verbose_trap anymore.
@@ -27,6 +27,9 @@ namespace llvm { | |||
} | |||
} | |||
|
|||
// Prefix for __builtin_verbose_trap. | |||
#define CLANG_VERBOSE_TRAP_PREFIX "__llvm_verbose_trap" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahatanak Nit: Why does the macro name mention CLANG
but the string mentions llvm
instead of Clang? Seems like this prefix is specific to Clang and not LLVM.
Also here’s suggestion for a shorter prefix __llvm_trap_msg
. It’s not a big deal so feel to ignore this advice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to be a macro or could it be a C++ constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we used a C++ constant, wouldn't it be harder to make a compile time constant string (e.g., ^__llvm_verbose_trap:
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@delcypher I don't think it has to mention llvm
. Can we make it even shorter, e.g., __trap_msg
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we usually use some llvm or clang prefix in these sort of things, to reduce the chance of conflict with some other implementation's reserved names? Could be worth a quick grepping for other similar symbol names in clang/llvm to see what we do/if there's some pattern to follow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
& not sure it being a compile time constant regex is super important - the regex compilation probably dwarfs the string concatenation required to add the ^
and :
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make it a C++ constant if that's preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any preference, but according to people working on lldb, macro would make it marginally easier to construct the regex in the LLDB frame recognizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal for LLDB either way, whatever people prefer here, we can work around it.
We could use the function name without the prefix as the key when searching for the subprogram in |
@@ -29,7 +29,9 @@ | |||
#include "llvm/IR/DebugInfo.h" | |||
#include "llvm/IR/ValueHandle.h" | |||
#include "llvm/Support/Allocator.h" | |||
#include <map> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to not belong to this commit?
clang/lib/Sema/SemaChecking.cpp
Outdated
continue; | ||
|
||
// FIXME: Add more checks and reject strings that can't be handled by | ||
// debuggers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this FIXME. A debug info producer can't know what can and cannot be handled by debuggers, and DWARF consumers need to support any string in the specified encoding (usually UTF-8) anyway.
Hmm, do other builtins actually end up as symbol names? I think most of them lower to an instruction or two - I guess this one doesn't lower to an actual symbol, only a DWARF symbol - but maybe that's still different enough it should use an llvm or clang in the name... (sorry, I think I saw this was discussed somewhere on the review, but I can't find where now... :/) I guess it came up here: #79230 (comment) and at the time it still had |
@dwblaikie @ahatanak Is there anything we can do to move this patch forward? It seems like it's almost ready to go with perhaps one remaining thread of conversation to address. We badly need this functionality in libc++, the lack of error messages when a hardening check is triggered results in very poor user experience. It also makes it harder to talk about hardening since we essentially need to describe both the current state and the state after this patch is merged. |
I think my last comment/question is still open? How/why did the symbol name end up dropping any llvm/clang component to avoid collisions with other names? |
I dropped llvm/clang because I didn't think it would cause any collision in practice, but I don't think we have to try too hard to shorten the prefix. |
@@ -27,6 +27,9 @@ namespace llvm { | |||
} | |||
} | |||
|
|||
// Prefix of the name of the artificial inline frame. | |||
#define CLANG_TRAP_PREFIX "__clang_trap_msg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline StringRef? usual reasons to avoid macros, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK - one minor nit, I'd probably avoid defining CLANG_TRAP_PREFIX as a macro, but instead as an inline or some other form of constant
@@ -28,7 +29,7 @@ namespace llvm { | |||
} | |||
|
|||
// Prefix of the name of the artificial inline frame. | |||
#define CLANG_TRAP_PREFIX "__clang_trap_msg" | |||
inline constexpr llvm::StringRef CLANG_TRAP_PREFIX = "__clang_trap_msg"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name should be changed to use the usual variable/constant naming convention (ClangTrapPrefix, I think, in this case).
@ahatanak Excited to see this finally land 🎉 |
@ahatanak Thank you so much for pushing this through! This will be awesome for our hardening effort in libc++. Thanks to everyone for all the feedback and discussions as well! |
The builtin causes the program to stop its execution abnormally and shows a human-readable description of the reason for the termination when a debugger is attached or in a symbolicated crash log. The motivation for the builtin is explained in the following RFC: https://discourse.llvm.org/t/rfc-adding-builtin-verbose-trap-string-literal/75845 clang's CodeGen lowers the builtin to `llvm.trap` and emits debugging information that represents an artificial inline frame whose name encodes the category and reason strings passed to the builtin.
The builtin causes the program to stop its execution abnormally and shows a human-readable description of the reason for the termination when a debugger is attached or in a symbolicated crash log.
The motivation for the builtin is explained in the following RFC:
https://discourse.llvm.org/t/rfc-adding-builtin-verbose-trap-string-literal/75845
clang's CodeGen lowers the builtin to
llvm.trap
and emits debugging information that represents an artificialinline frame whose name encodes the category and reason strings passed to the builtin.