[IR] Reformulate LLVM's EH funclet IR

While we have successfully implemented a funclet-oriented EH scheme on top of LLVM IR, our scheme has some notable deficiencies: - catchendpad and cleanupendpad are necessary in the current design but they are difficult to explain to others, even to seasoned LLVM experts. - catchendpad and cleanupendpad are optimization barriers. They cannot be split and force all potentially throwing call-sites to be invokes. This has a noticable effect on the quality of our code generation. - catchpad, while similar in some aspects to invoke, is fairly awkward. It is unsplittable, starts a funclet, and has control flow to other funclets. - The nesting relationship between funclets is currently a property of control flow edges. Because of this, we are forced to carefully analyze the flow graph to see if there might potentially exist illegal nesting among funclets. While we have logic to clone funclets when they are illegally nested, it would be nicer if we had a representation which forbade them upfront. Let's clean this up a bit by doing the following: - Instead, make catchpad more like cleanuppad and landingpad: no control flow, just a bunch of simple operands; catchpad would be splittable. - Introduce catchswitch, a control flow instruction designed to model the constraints of funclet oriented EH. - Make funclet scoping explicit by having funclet instructions consume the token produced by the funclet which contains them. - Remove catchendpad and cleanupendpad. Their presence can be inferred implicitly using coloring information. N.B. The state numbering code for the CLR has been updated but the veracity of it's output cannot be spoken for. An expert should take a look to make sure the results are reasonable. Reviewers: rnk, JosephTremoulet, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D15139 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255422 91177308-0d34-0410-b5e6-96231b3b80d8
microsoft · Dec 12, 2015 · 8cec2f2 · 8cec2f2
1 parent b73b347
commit 8cec2f2
Show file tree

Hide file tree

Showing 106 changed files with 3,078 additions and 6,086 deletions.
diff --git a/docs/ExceptionHandling.rst b/docs/ExceptionHandling.rst
@@ -522,16 +522,12 @@ table.
 Exception Handling using the Windows Runtime
 =================================================
 
-(Note: Windows C++ exception handling support is a work in progress and is not
-yet fully implemented.  The text below describes how it will work when
-completed.)
-
 Background on Windows exceptions
 ---------------------------------
 
-Interacting with exceptions on Windows is significantly more complicated than on
-Itanium C++ ABI platforms. The fundamental difference between the two models is
-that Itanium EH is designed around the idea of "successive unwinding," while
+Interacting with exceptions on Windows is significantly more complicated than
+on Itanium C++ ABI platforms. The fundamental difference between the two models
+is that Itanium EH is designed around the idea of "successive unwinding," while
 Windows EH is not.
 
 Under Itanium, throwing an exception typically involes allocating thread local
@@ -618,10 +614,11 @@ purposes.
 
 The following new instructions are considered "exception handling pads", in that
 they must be the first non-phi instruction of a basic block that may be the
-unwind destination of an invoke: ``catchpad``, ``cleanuppad``, and
-``terminatepad``. As with landingpads, when entering a try scope, if the
+unwind destination of an EH flow edge:
+``catchswitch``, ``catchpad``, ``cleanuppad``, and ``terminatepad``.
+As with landingpads, when entering a try scope, if the
 frontend encounters a call site that may throw an exception, it should emit an
-invoke that unwinds to a ``catchpad`` block. Similarly, inside the scope of a
+invoke that unwinds to a ``catchswitch`` block. Similarly, inside the scope of a
 C++ object with a destructor, invokes should unwind to a ``cleanuppad``. The
 ``terminatepad`` instruction exists to represent ``noexcept`` and throw
 specifications with one combined instruction. All potentially throwing calls in
@@ -634,26 +631,20 @@ generated funclet).  A catch handler which reaches its end by normal execution
 executes a ``catchret`` instruction, which is a terminator indicating where in
 the function control is returned to.  A cleanup handler which reaches its end
 by normal execution executes a ``cleanupret`` instruction, which is a terminator
-indicating where the active exception will unwind to next.  A catch or cleanup
-handler which is exited by another exception being raised during its execution will
-unwind through a ``catchendpad`` or ``cleanuupendpad`` (respectively).  The
-``catchendpad`` and ``cleanupendpad`` instructions are considered "exception
-handling pads" in the same sense that ``catchpad``, ``cleanuppad``, and
-``terminatepad`` are.
-
-Each of these new EH pad instructions has a way to identify which
-action should be considered after this action. The ``catchpad`` and
-``terminatepad`` instructions are terminators, and have a label operand considered
-to be an unwind destination analogous to the unwind destination of an invoke. The
-``cleanuppad`` instruction is different from the other two in that it is not a
-terminator. The code inside a cleanuppad runs before transferring control to the
-next action, so the ``cleanupret`` and ``cleanupendpad`` instructions are the
-instructions that hold a label operand and unwind to the next EH pad. All of
-these "unwind edges" may refer to a basic block that contains an EH pad instruction,
-or they may simply unwind to the caller. Unwinding to the caller has roughly the
-same semantics as the ``resume`` instruction in the ``landingpad`` model. When
-inlining through an invoke, instructions that unwind to the caller are hooked
-up to unwind to the unwind destination of the call site.
+indicating where the active exception will unwind to next.
+
+Each of these new EH pad instructions has a way to identify which action should
+be considered after this action. The ``catchswitch`` and ``terminatepad``
+instructions are terminators, and have a unwind destination operand analogous
+to the unwind destination of an invoke.  The ``cleanuppad`` instruction is not
+a terminator, so the unwind destination is stored on the ``cleanupret``
+instruction instead. Successfully executing a catch handler should resume
+normal control flow, so neither ``catchpad`` nor ``catchret`` instructions can
+unwind. All of these "unwind edges" may refer to a basic block that contains an
+EH pad instruction, or they may unwind to the caller.  Unwinding to the caller
+has roughly the same semantics as the ``resume`` instruction in the landingpad
+model. When inlining through an invoke, instructions that unwind to the caller
+are hooked up to unwind to the unwind destination of the call site.
 
 Putting things together, here is a hypothetical lowering of some C++ that uses
 all of the new IR instructions:
@@ -694,33 +685,95 @@ all of the new IR instructions:
     call void @"\01??_DCleanup@@QEAA@XZ"(%struct.Cleanup* nonnull %obj) nounwind
     br label %return
 
-  return:                                           ; preds = %invoke.cont.2, %invoke.cont.3
-    %retval.0 = phi i32 [ 0, %invoke.cont.2 ], [ %9, %catch ]
+  return:                                           ; preds = %invoke.cont.3, %invoke.cont.2
+    %retval.0 = phi i32 [ 0, %invoke.cont.2 ], [ %3, %invoke.cont.3 ]
     ret i32 %retval.0
 
-  ; EH scope code, ordered innermost to outermost:
-
-  lpad.cleanup:                                     ; preds = %invoke.cont
-    %cleanup = cleanuppad []
-    call void @"\01??_DCleanup@@QEAA@XZ"(%struct.Cleanup* nonnull %obj) nounwind
-    cleanupret %cleanup unwind label %lpad.catch
+  lpad.cleanup:                                     ; preds = %invoke.cont.2
+    %0 = cleanuppad within none []
+    call void @"\01??1Cleanup@@QEAA@XZ"(%struct.Cleanup* nonnull %obj) nounwind
+    cleanupret %0 unwind label %lpad.catch
 
-  lpad.catch:                                       ; preds = %entry, %lpad.cleanup
-    %catch = catchpad [%rtti.TypeDescriptor2* @"\01??_R0H@8", i32 0, i32* %e]
-            to label %catch.body unwind label %catchend
+  lpad.catch:                                       ; preds = %lpad.cleanup, %entry
+    %1 = catchswitch within none [label %catch.body] unwind label %lpad.terminate
 
   catch.body:                                       ; preds = %lpad.catch
+    %catch = catchpad within %1 [%rtti.TypeDescriptor2* @"\01??_R0H@8", i32 0, i32* %e]
     invoke void @"\01?may_throw@@YAXXZ"()
-            to label %invoke.cont.3 unwind label %catchend
+            to label %invoke.cont.3 unwind label %lpad.terminate
 
   invoke.cont.3:                                    ; preds = %catch.body
-    %9 = load i32, i32* %e, align 4
-    catchret %catch to label %return
+    %3 = load i32, i32* %e, align 4
+    catchret from %catch to label %return
+
+  lpad.terminate:                                   ; preds = %catch.body, %lpad.catch
+    terminatepad within none [void ()* @"\01?terminate@@YAXXZ"] unwind to caller
+  }
+
+Funclet parent tokens
+-----------------------
 
-  catchend:                                         ; preds = %lpad.catch, %catch.body
-    catchendpad unwind label %lpad.terminate
+In order to produce tables for EH personalities that use funclets, it is
+necessary to recover the nesting that was present in the source. This funclet
+parent relationship is encoded in the IR using tokens produced by the new "pad"
+instructions. The token operand of a "pad" or "ret" instruction indicates which
+funclet it is in, or "none" if it is not nested within another funclet.
 
-  lpad.terminate:                                   ; preds = %catchend
-    terminatepad [void ()* @"\01?terminate@@YAXXZ"]
-            unwind to caller
+The ``catchpad`` and ``cleanuppad`` instructions establish new funclets, and
+their tokens are consumed by other "pad" instructions to establish membership.
+The ``catchswitch`` instruction does not create a funclet, but it produces a
+token that is always consumed by its immediate successor ``catchpad``
+instructions. This ensures that every catch handler modelled by a ``catchpad``
+belongs to exactly one ``catchswitch``, which models the dispatch point after a
+C++ try. The ``terminatepad`` instruction cannot contain lexically nested
+funclets inside the termination action, so it does not produce a token.
+
+Here is an example of what this nesting looks like using some hypothetical
+C++ code:
+
+.. code-block:: c
+
+  void f() {
+    try {
+      throw;
+    } catch (...) {
+      try {
+        throw;
+      } catch (...) {
+      }
+    }
   }
+
+.. code-block:: llvm
+  define void @f() #0 personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*) {
+  entry:
+    invoke void @_CxxThrowException(i8* null, %eh.ThrowInfo* null) #1
+            to label %unreachable unwind label %catch.dispatch
+
+  catch.dispatch:                                   ; preds = %entry
+    %0 = catchswitch within none [label %catch] unwind to caller
+
+  catch:                                            ; preds = %catch.dispatch
+    %1 = catchpad within %0 [i8* null, i32 64, i8* null]
+    invoke void @_CxxThrowException(i8* null, %eh.ThrowInfo* null) #1
+            to label %unreachable unwind label %catch.dispatch2
+
+  catch.dispatch2:                                  ; preds = %catch
+    %2 = catchswitch within %1 [label %catch3] unwind to caller
+
+  catch3:                                           ; preds = %catch.dispatch2
+    %3 = catchpad within %2 [i8* null, i32 64, i8* null]
+    catchret from %3 to label %try.cont
+
+  try.cont:                                         ; preds = %catch3
+    catchret from %1 to label %try.cont6
+
+  try.cont6:                                        ; preds = %try.cont
+    ret void
+
+  unreachable:                                      ; preds = %catch, %entry
+    unreachable
+  }
+
+The "inner" ``catchswitch`` consumes ``%1`` which is produced by the outer
+catchswitch.