Auto merge of #94261 - michaelwoerister:debuginfo-types-refactor, r=w…

…esleywiser debuginfo: Refactor debuginfo generation for types This PR implements the refactoring of the `rustc_codegen_llvm::debuginfo::metadata` module as described in MCP rust-lang/compiler-team#482. In particular it - changes names to use `di_node` instead of `metadata` - uniformly names all functions that build new debuginfo nodes `build_xyz_di_node` - renames `CrateDebugContext` to `CodegenUnitDebugContext` (which is more accurate) - removes outdated parts from `compiler/rustc_codegen_llvm/src/debuginfo/doc.md` - moves `TypeMap` and functions that work directly work with it to a new `type_map` module - moves enum related builder functions to a new `enums` module - splits enum debuginfo building for the native and cpp-like cases, since they are mostly separate - uses `SmallVec` instead of `Vec` in many places - removes the old infrastructure for dealing with recursion cycles (`create_and_register_recursive_type_forward_declaration()`, `RecursiveTypeDescription`, `set_members_of_composite_type()`, `MemberDescription`, `MemberDescriptionFactory`, `prepare_xyz_metadata()`, etc) - adds `type_map::build_type_with_children()` as a replacement for dealing with recursion cycles - adds many (doc-)comments explaining what's going on - changes cpp-like naming for C-Style enums so they don't get a `enum$<...>` name (because the NatVis visualizer does not apply to them) - fixes detection of what is a C-style enum because some enums where classified as C-style even though they have fields - changes cpp-like naming for generator enums so that NatVis works for them - changes the position of discriminant debuginfo node so it is consistently nested inside the top-level union instead of, sometimes, next to it The following could be done in subsequent PRs: - add caching for `closure_saved_names_of_captured_variables` - add caching for `generator_layout_and_saved_local_names` - fix inconsistent handling of what is considered a C-style enum wrt to debuginfo - rename `metadata` module to `types` - move common generator fields to front instead of appending them This PR is based on #93644 which is not merged yet. Right now, the changes are all done in one big commit. They could be split into smaller commits but hopefully the list of changes above makes it tractable to review them as a single commit too. For now: r? `@ghost` (let's see if this affects compile times)
rust-lang · Mar 15, 2022 · 0407030 · 0407030
2 parents 3ba1ebe + aa2408a
commit 0407030
Show file tree

Hide file tree

Showing 24 changed files with 2,452 additions and 1,878 deletions.
diff --git a/compiler/rustc_codegen_gcc/src/debuginfo.rs b/compiler/rustc_codegen_gcc/src/debuginfo.rs
@@ -31,7 +31,7 @@ impl<'a, 'gcc, 'tcx> DebugInfoBuilderMethods for Builder<'a, 'gcc, 'tcx> {
 }
 
 impl<'gcc, 'tcx> DebugInfoMethods<'tcx> for CodegenCx<'gcc, 'tcx> {
-    fn create_vtable_metadata(&self, _ty: Ty<'tcx>, _trait_ref: Option<PolyExistentialTraitRef<'tcx>>, _vtable: Self::Value) {
+    fn create_vtable_debuginfo(&self, _ty: Ty<'tcx>, _trait_ref: Option<PolyExistentialTraitRef<'tcx>>, _vtable: Self::Value) {
         // TODO(antoyo)
     }
 

diff --git a/compiler/rustc_codegen_llvm/src/allocator.rs b/compiler/rustc_codegen_llvm/src/allocator.rs
@@ -140,8 +140,8 @@ pub(crate) unsafe fn codegen(
     llvm::LLVMDisposeBuilder(llbuilder);
 
     if tcx.sess.opts.debuginfo != DebugInfo::None {
-        let dbg_cx = debuginfo::CrateDebugContext::new(llmod);
-        debuginfo::metadata::compile_unit_metadata(tcx, module_name, &dbg_cx);
+        let dbg_cx = debuginfo::CodegenUnitDebugContext::new(llmod);
+        debuginfo::metadata::build_compile_unit_di_node(tcx, module_name, &dbg_cx);
         dbg_cx.finalize(tcx.sess);
     }
 }
diff --git a/compiler/rustc_codegen_llvm/src/consts.rs b/compiler/rustc_codegen_llvm/src/consts.rs
@@ -428,7 +428,7 @@ impl<'ll> StaticMethods for CodegenCx<'ll, '_> {
                 llvm::LLVMSetGlobalConstant(g, llvm::True);
             }
 
-            debuginfo::create_global_var_metadata(self, def_id, g);
+            debuginfo::build_global_var_di_node(self, def_id, g);
 
             if attrs.flags.contains(CodegenFnAttrFlags::THREAD_LOCAL) {
                 llvm::set_thread_local_mode(g, self.tls_model);

diff --git a/compiler/rustc_codegen_llvm/src/context.rs b/compiler/rustc_codegen_llvm/src/context.rs
@@ -95,7 +95,7 @@ pub struct CodegenCx<'ll, 'tcx> {
     pub isize_ty: &'ll Type,
 
     pub coverage_cx: Option<coverageinfo::CrateCoverageContext<'ll, 'tcx>>,
-    pub dbg_cx: Option<debuginfo::CrateDebugContext<'ll, 'tcx>>,
+    pub dbg_cx: Option<debuginfo::CodegenUnitDebugContext<'ll, 'tcx>>,
 
     eh_personality: Cell<Option<&'ll Value>>,
     eh_catch_typeinfo: Cell<Option<&'ll Value>>,
@@ -396,8 +396,12 @@ impl<'ll, 'tcx> CodegenCx<'ll, 'tcx> {
         };
 
         let dbg_cx = if tcx.sess.opts.debuginfo != DebugInfo::None {
-            let dctx = debuginfo::CrateDebugContext::new(llmod);
-            debuginfo::metadata::compile_unit_metadata(tcx, codegen_unit.name().as_str(), &dctx);
+            let dctx = debuginfo::CodegenUnitDebugContext::new(llmod);
+            debuginfo::metadata::build_compile_unit_di_node(
+                tcx,
+                codegen_unit.name().as_str(),
+                &dctx,
+            );
             Some(dctx)
         } else {
             None

diff --git a/compiler/rustc_codegen_llvm/src/debuginfo/doc.md b/compiler/rustc_codegen_llvm/src/debuginfo/doc.md
@@ -34,7 +34,7 @@ The function will take care of probing the cache for an existing node for
 that exact file path.
 
 All private state used by the module is stored within either the
-CrateDebugContext struct (owned by the CodegenCx) or the
+CodegenUnitDebugContext struct (owned by the CodegenCx) or the
 FunctionDebugContext (owned by the FunctionCx).
 
 This file consists of three conceptual sections:
@@ -72,21 +72,16 @@ describe(t = List)
       ...
 ```
 
-To break cycles like these, we use "forward declarations". That is, when
+To break cycles like these, we use "stubs". That is, when
 the algorithm encounters a possibly recursive type (any struct or enum), it
 immediately creates a type description node and inserts it into the cache
 *before* describing the members of the type. This type description is just
 a stub (as type members are not described and added to it yet) but it
 allows the algorithm to already refer to the type. After the stub is
 inserted into the cache, the algorithm continues as before. If it now
 encounters a recursive reference, it will hit the cache and does not try to
-describe the type anew.
-
-This behavior is encapsulated in the 'RecursiveTypeDescription' enum,
-which represents a kind of continuation, storing all state needed to
-continue traversal at the type members after the type has been registered
-with the cache. (This implementation approach might be a tad over-
-engineered and may change in the future)
+describe the type anew. This behavior is encapsulated in the
+`type_map::build_type_with_children()` function.
 
 
 ## Source Locations and Line Information
@@ -134,47 +129,3 @@ detection. The `create_argument_metadata()` and related functions take care
 of linking the `llvm.dbg.declare` instructions to the correct source
 locations even while source location emission is still disabled, so there
 is no need to do anything special with source location handling here.
-
-## Unique Type Identification
-
-In order for link-time optimization to work properly, LLVM needs a unique
-type identifier that tells it across compilation units which types are the
-same as others. This type identifier is created by
-`TypeMap::get_unique_type_id_of_type()` using the following algorithm:
-
-1. Primitive types have their name as ID
-
-2. Structs, enums and traits have a multipart identifier
-
-  1. The first part is the SVH (strict version hash) of the crate they
-     were originally defined in
-
-  2. The second part is the ast::NodeId of the definition in their
-     original crate
-
-  3. The final part is a concatenation of the type IDs of their concrete
-     type arguments if they are generic types.
-
-3. Tuple-, pointer-, and function types are structurally identified, which
-   means that they are equivalent if their component types are equivalent
-   (i.e., `(i32, i32)` is the same regardless in which crate it is used).
-
-This algorithm also provides a stable ID for types that are defined in one
-crate but instantiated from metadata within another crate. We just have to
-take care to always map crate and `NodeId`s back to the original crate
-context.
-
-As a side-effect these unique type IDs also help to solve a problem arising
-from lifetime parameters. Since lifetime parameters are completely omitted
-in debuginfo, more than one `Ty` instance may map to the same debuginfo
-type metadata, that is, some struct `Struct<'a>` may have N instantiations
-with different concrete substitutions for `'a`, and thus there will be N
-`Ty` instances for the type `Struct<'a>` even though it is not generic
-otherwise. Unfortunately this means that we cannot use `ty::type_id()` as
-cheap identifier for type metadata -- we have done this in the past, but it
-led to unnecessary metadata duplication in the best case and LLVM
-assertions in the worst. However, the unique type ID as described above
-*can* be used as identifier. Since it is comparatively expensive to
-construct, though, `ty::type_id()` is still used additionally as an
-optimization for cases where the exact same type has been seen before
-(which is most of the time).