Skip to content

Commit

Permalink
Fully rework the explanation of the algorithm
Browse files Browse the repository at this point in the history
  • Loading branch information
Nadrieril committed Nov 5, 2023
1 parent 3349604 commit 9bfb7d9
Show file tree
Hide file tree
Showing 2 changed files with 464 additions and 278 deletions.
89 changes: 46 additions & 43 deletions compiler/rustc_mir_build/src/thir/pattern/deconstruct_pat.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,51 +3,54 @@
//! `Constructor` enum, a `Fields` struct, and various operations to manipulate them and convert
//! them from/to patterns.
//!
//! There's one idea that is not detailed in [`super::usefulness`] because the details are not
//! needed there: _constructor splitting_.
//! The one idea that is not detailed in [`super::usefulness`] is _constructor splitting_.
//!
//! # Constructor splitting
//! # Constructor grouping and splitting
//!
//! The idea is as follows: given a constructor `c` and a matrix, we want to specialize in turn
//! with all the value constructors that are covered by `c`, and compute usefulness for each.
//! Instead of listing all those constructors (which is intractable), we group those value
//! constructors together as much as possible. Example:
//! As explained in the corresponding section in [`super::usefulness`], to make usefulness tractable
//! we need to group together constructors that have the same effect when they are used to
//! specialize the matrix.
//!
//! Example:
//! ```compile_fail,E0004
//! match (0, false) {
//! (0 ..=100, true) => {} // `p_1`
//! (50..=150, false) => {} // `p_2`
//! (0 ..=200, _) => {} // `q`
//! (0 ..=100, true) => {}
//! (50..=150, false) => {}
//! (0 ..=200, _) => {}
//! }
//! ```
//!
//! The naive approach would try all numbers in the range `0..=200`. But we can be a lot more
//! clever: `0` and `1` for example will match the exact same rows, and return equivalent
//! witnesses. In fact all of `0..50` would. We can thus restrict our exploration to 4
//! constructors: `0..50`, `50..=100`, `101..=150` and `151..=200`. That is enough and infinitely
//! more tractable.
//! Here we can restrict specialization to 5 cases: `0..50`, `50..=100`, `101..=150`, `151..=200`
//! and `200..`.
//!
//! We capture this idea in a function `split(p_1 ... p_n, c)` which returns a list of constructors
//! `c'` covered by `c`. Given such a `c'`, we require that all value ctors `c''` covered by `c'`
//! return an equivalent set of witnesses after specializing and computing usefulness.
//! In the example above, witnesses for specializing by `c''` covered by `0..50` will only differ
//! in their first element.
//! In [`super::usefulness`], we had said that `specialize` only takes value-only constructors. We
//! relax this restriction: we allow `specialize` to take constructors like `0..50` as long as we're
//! careful to only do that with constructors that make sense.
//!
//! We usually also ask that the `c'` together cover all of the original `c`. However we allow
//! skipping some constructors as long as it doesn't change whether the resulting list of witnesses
//! is empty of not. We use this in the wildcard `_` case.
//! For example, `specialize(0..50, (0..=100, true))` is sensible, but `specialize(50..=200,
//! (0..=100, true))` is not. The rule is that we must only use a constructor that is a subset of
//! constructors in the column (as computed by [`Constructor::is_covered_by`]). No non-trivial
//! intersections are allowed.
//!
//! Note how we only consider the first column of the match. In fact we take as input only the list
//! of the constructors of that column. We must return a set of constructors that cover the whole
//! type and is grouped as much as possible, without breaking the "must be included" rule above. The
//! precise set of invariants is described in [`SplitConstructorSet`].
//!
//! We compute this in two steps: first [`ConstructorSet::for_ty`] computes a representation of the
//! set of all possible constructors for the type. Then [`ConstructorSet::split`] looks at the
//! column of constructors and splits the set into groups accordingly.
//!
//! Constructor splitting has two interesting special cases: integer range splitting (see
//! [`IntRange::split`]) and slice splitting (see [`Slice::split`]).
//!
//! Splitting is implemented in the [`ConstructorSet::split`] function. We don't do splitting for
//! or-patterns; instead we just try the alternatives one-by-one. For details on splitting
//! wildcards, see [`ConstructorSet::split`]; for integer ranges, see [`IntRange::split`]; for
//! slices, see [`Slice::split`].
//!
//! ## Opaque patterns
//!
//! Some patterns, such as TODO, cannot be inspected, which we handle with `Constructor::Opaque`.
//! Since we know nothing of these patterns, we assume they never cover each other. In order to
//! respect the invariants of [`SplitConstructorSet`], we give each `Opaque` constructor a unique id
//! so we can recognize it.
//! Some patterns, such as constants that are not allowed to be matched structurally, cannot be
//! inspected, which we handle with `Constructor::Opaque`. Since we know nothing of these patterns,
//! we assume they never cover each other. In order to respect the invariants of
//! [`SplitConstructorSet`], we give each `Opaque` constructor a unique id so we can recognize it.
use std::cell::Cell;
use std::cmp::{self, max, min, Ordering};
Expand Down Expand Up @@ -645,8 +648,8 @@ impl OpaqueId {
/// `Fields`.
#[derive(Clone, Debug, PartialEq)]
pub(super) enum Constructor<'tcx> {
/// The constructor for patterns that have a single constructor, like tuples, struct patterns
/// and fixed-length arrays.
/// The constructor for patterns that have a single constructor, like tuples, struct patterns,
/// and references. Fixed-length arrays are treated separately with `Slice`.
Single,
/// Enum variants.
Variant(VariantIdx),
Expand Down Expand Up @@ -851,16 +854,16 @@ pub(super) enum ConstructorSet {
/// `present` is morally the set of constructors present in the column, and `missing` is the set of
/// constructors that exist in the type but are not present in the column.
///
/// More formally, they respect the following constraints:
/// - the union of `present` and `missing` covers the whole type
/// - `present` and `missing` are disjoint
/// - neither contains wildcards
/// - each constructor in `present` is covered by some non-wildcard constructor in the column
/// - together, the constructors in `present` cover all the non-wildcard constructor in the column
/// - non-wildcards in the column do no cover anything in `missing`
/// - constructors in `present` and `missing` are split for the column; in other words, they are
/// either fully included in or disjoint from each constructor in the column. This avoids
/// non-trivial intersections like between `0..10` and `5..15`.
/// More formally, if we discard wildcards from the column, this respects the following constraints:
/// 1. the union of `present` and `missing` covers the whole type
/// 2. each constructor in `present` is covered by something in the column
/// 3. no constructor in `missing` is covered by anything in the column
/// 4. each constructor in the column is equal to the union of one or more constructors in `present`
/// 5. `missing` does not contain empty constructors (see discussion about emptiness at the top of
/// the file);
/// 6. constructors in `present` and `missing` are split for the column; in other words, they are
/// either fully included in or fully disjoint from each constructor in the column. In other
/// words, there are no non-trivial intersections like between `0..10` and `5..15`.
#[derive(Debug)]
pub(super) struct SplitConstructorSet<'tcx> {
pub(super) present: SmallVec<[Constructor<'tcx>; 1]>,
Expand Down
Loading

0 comments on commit 9bfb7d9

Please sign in to comment.