Skip to content

Commit

Permalink
Auto merge of #94261 - michaelwoerister:debuginfo-types-refactor, r=w…
Browse files Browse the repository at this point in the history
…esleywiser

debuginfo: Refactor debuginfo generation for types

This PR implements the refactoring of the `rustc_codegen_llvm::debuginfo::metadata` module as described in MCP rust-lang/compiler-team#482.

In particular it
- changes names to use `di_node` instead of `metadata`
- uniformly names all functions that build new debuginfo nodes `build_xyz_di_node`
- renames `CrateDebugContext` to `CodegenUnitDebugContext` (which is more accurate)
- removes outdated parts from `compiler/rustc_codegen_llvm/src/debuginfo/doc.md`
- moves `TypeMap` and functions that work directly work with it to a new `type_map` module
- moves enum related builder functions to a new `enums` module
- splits enum debuginfo building for the native and cpp-like cases, since they are mostly separate
- uses `SmallVec` instead of `Vec` in many places
- removes the old infrastructure for dealing with recursion cycles (`create_and_register_recursive_type_forward_declaration()`, `RecursiveTypeDescription`, `set_members_of_composite_type()`, `MemberDescription`, `MemberDescriptionFactory`, `prepare_xyz_metadata()`, etc)
- adds `type_map::build_type_with_children()` as a replacement for dealing with recursion cycles
- adds many (doc-)comments explaining what's going on
- changes cpp-like naming for C-Style enums so they don't get a `enum$<...>` name (because the NatVis visualizer does not apply to them)
- fixes detection of what is a C-style enum because some enums where classified as C-style even though they have fields
- changes cpp-like naming for generator enums so that NatVis works for them
- changes the position of discriminant debuginfo node so it is consistently nested inside the top-level union instead of, sometimes, next to it

The following could be done in subsequent PRs:
- add caching for `closure_saved_names_of_captured_variables`
- add caching for `generator_layout_and_saved_local_names`
- fix inconsistent handling of what is considered a C-style enum wrt to debuginfo
- rename `metadata` module to `types`
- move common generator fields to front instead of appending them

This PR is based on #93644 which is not merged yet.

Right now, the changes are all done in one big commit. They could be split into smaller commits but hopefully the list of changes above makes it tractable to review them as a single commit too.

For now: r? `@ghost` (let's see if this affects compile times)
  • Loading branch information
bors committed Mar 15, 2022
2 parents 3ba1ebe + aa2408a commit 0407030
Show file tree
Hide file tree
Showing 24 changed files with 2,452 additions and 1,878 deletions.
2 changes: 1 addition & 1 deletion compiler/rustc_codegen_gcc/src/debuginfo.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ impl<'a, 'gcc, 'tcx> DebugInfoBuilderMethods for Builder<'a, 'gcc, 'tcx> {
}

impl<'gcc, 'tcx> DebugInfoMethods<'tcx> for CodegenCx<'gcc, 'tcx> {
fn create_vtable_metadata(&self, _ty: Ty<'tcx>, _trait_ref: Option<PolyExistentialTraitRef<'tcx>>, _vtable: Self::Value) {
fn create_vtable_debuginfo(&self, _ty: Ty<'tcx>, _trait_ref: Option<PolyExistentialTraitRef<'tcx>>, _vtable: Self::Value) {
// TODO(antoyo)
}

Expand Down
4 changes: 2 additions & 2 deletions compiler/rustc_codegen_llvm/src/allocator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -140,8 +140,8 @@ pub(crate) unsafe fn codegen(
llvm::LLVMDisposeBuilder(llbuilder);

if tcx.sess.opts.debuginfo != DebugInfo::None {
let dbg_cx = debuginfo::CrateDebugContext::new(llmod);
debuginfo::metadata::compile_unit_metadata(tcx, module_name, &dbg_cx);
let dbg_cx = debuginfo::CodegenUnitDebugContext::new(llmod);
debuginfo::metadata::build_compile_unit_di_node(tcx, module_name, &dbg_cx);
dbg_cx.finalize(tcx.sess);
}
}
2 changes: 1 addition & 1 deletion compiler/rustc_codegen_llvm/src/consts.rs
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@ impl<'ll> StaticMethods for CodegenCx<'ll, '_> {
llvm::LLVMSetGlobalConstant(g, llvm::True);
}

debuginfo::create_global_var_metadata(self, def_id, g);
debuginfo::build_global_var_di_node(self, def_id, g);

if attrs.flags.contains(CodegenFnAttrFlags::THREAD_LOCAL) {
llvm::set_thread_local_mode(g, self.tls_model);
Expand Down
10 changes: 7 additions & 3 deletions compiler/rustc_codegen_llvm/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ pub struct CodegenCx<'ll, 'tcx> {
pub isize_ty: &'ll Type,

pub coverage_cx: Option<coverageinfo::CrateCoverageContext<'ll, 'tcx>>,
pub dbg_cx: Option<debuginfo::CrateDebugContext<'ll, 'tcx>>,
pub dbg_cx: Option<debuginfo::CodegenUnitDebugContext<'ll, 'tcx>>,

eh_personality: Cell<Option<&'ll Value>>,
eh_catch_typeinfo: Cell<Option<&'ll Value>>,
Expand Down Expand Up @@ -396,8 +396,12 @@ impl<'ll, 'tcx> CodegenCx<'ll, 'tcx> {
};

let dbg_cx = if tcx.sess.opts.debuginfo != DebugInfo::None {
let dctx = debuginfo::CrateDebugContext::new(llmod);
debuginfo::metadata::compile_unit_metadata(tcx, codegen_unit.name().as_str(), &dctx);
let dctx = debuginfo::CodegenUnitDebugContext::new(llmod);
debuginfo::metadata::build_compile_unit_di_node(
tcx,
codegen_unit.name().as_str(),
&dctx,
);
Some(dctx)
} else {
None
Expand Down
57 changes: 4 additions & 53 deletions compiler/rustc_codegen_llvm/src/debuginfo/doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The function will take care of probing the cache for an existing node for
that exact file path.

All private state used by the module is stored within either the
CrateDebugContext struct (owned by the CodegenCx) or the
CodegenUnitDebugContext struct (owned by the CodegenCx) or the
FunctionDebugContext (owned by the FunctionCx).

This file consists of three conceptual sections:
Expand Down Expand Up @@ -72,21 +72,16 @@ describe(t = List)
...
```

To break cycles like these, we use "forward declarations". That is, when
To break cycles like these, we use "stubs". That is, when
the algorithm encounters a possibly recursive type (any struct or enum), it
immediately creates a type description node and inserts it into the cache
*before* describing the members of the type. This type description is just
a stub (as type members are not described and added to it yet) but it
allows the algorithm to already refer to the type. After the stub is
inserted into the cache, the algorithm continues as before. If it now
encounters a recursive reference, it will hit the cache and does not try to
describe the type anew.

This behavior is encapsulated in the 'RecursiveTypeDescription' enum,
which represents a kind of continuation, storing all state needed to
continue traversal at the type members after the type has been registered
with the cache. (This implementation approach might be a tad over-
engineered and may change in the future)
describe the type anew. This behavior is encapsulated in the
`type_map::build_type_with_children()` function.


## Source Locations and Line Information
Expand Down Expand Up @@ -134,47 +129,3 @@ detection. The `create_argument_metadata()` and related functions take care
of linking the `llvm.dbg.declare` instructions to the correct source
locations even while source location emission is still disabled, so there
is no need to do anything special with source location handling here.

## Unique Type Identification

In order for link-time optimization to work properly, LLVM needs a unique
type identifier that tells it across compilation units which types are the
same as others. This type identifier is created by
`TypeMap::get_unique_type_id_of_type()` using the following algorithm:

1. Primitive types have their name as ID

2. Structs, enums and traits have a multipart identifier

1. The first part is the SVH (strict version hash) of the crate they
were originally defined in

2. The second part is the ast::NodeId of the definition in their
original crate

3. The final part is a concatenation of the type IDs of their concrete
type arguments if they are generic types.

3. Tuple-, pointer-, and function types are structurally identified, which
means that they are equivalent if their component types are equivalent
(i.e., `(i32, i32)` is the same regardless in which crate it is used).

This algorithm also provides a stable ID for types that are defined in one
crate but instantiated from metadata within another crate. We just have to
take care to always map crate and `NodeId`s back to the original crate
context.

As a side-effect these unique type IDs also help to solve a problem arising
from lifetime parameters. Since lifetime parameters are completely omitted
in debuginfo, more than one `Ty` instance may map to the same debuginfo
type metadata, that is, some struct `Struct<'a>` may have N instantiations
with different concrete substitutions for `'a`, and thus there will be N
`Ty` instances for the type `Struct<'a>` even though it is not generic
otherwise. Unfortunately this means that we cannot use `ty::type_id()` as
cheap identifier for type metadata -- we have done this in the past, but it
led to unnecessary metadata duplication in the best case and LLVM
assertions in the worst. However, the unique type ID as described above
*can* be used as identifier. Since it is comparatively expensive to
construct, though, `ty::type_id()` is still used additionally as an
optimization for cases where the exact same type has been seen before
(which is most of the time).
Loading

0 comments on commit 0407030

Please sign in to comment.