-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul interning. #93148
Overhaul interning. #93148
Conversation
Some changes occurred in cc @camelid |
rustc_middle::ty::Uniq
.rustc_middle::ty::Uniq
.
@michaelwoerister Ignore the first two commits, they come from #93147, which this PR builds upon. The third commit is the one of interest. This PR was inspired by (a) my finding the interned types very hard to understand, (b) this GitHub comment, and (c) this Zulip comment. We could conceivably merge this as is, but it's probably worth converting some of the more heavily used interned types to make sure there aren't any unexpected obstacles. @michaelwoerister I'd appreciate feedback about whether you think this is heading in the right direction. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to this, i've wrote https://lcnr.de/blog/low-effort-interner/ where I ended up with a type using the same name ^^ using the following type from the "perrrrrrf" section in my blog with edit 2. might be worth considering:
mod private {
pub struct ProofToken;
}
#[repr(transparent)]
pub struct Uniq<T>(pub T, pub private::ProofToken);
With this, the Eq
and PartialEq
impls don't need to temporarily use &&T
which, while probably easily getting optimized out, might slightly improve perf. More importantly for me, you can now access the T
while pattern matching without being able to mutate it, as it will always be behind a immutable reference.
compiler/rustc_middle/src/ty/mod.rs
Outdated
/// will cause incorrect behaviour, though it shouldn't cause undefined | ||
/// behaviour. For this reason it is marked `unsafe`. | ||
unsafe fn new(ptr: &'tcx T) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering that getting this wrong is memory safe, i would prefer
fn new_unchecked(ptr: &'tcx T) -> Self {
here instead 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my stash I've written these methods, that should cover a lot of use cases, preventing new_unchecked
callsites.
pub unsafe fn new_unchecked(inner: &'tcx T) -> Self {
Uniq(inner)
}
pub fn from_arena_interner(
value: T,
arena: &'tcx WorkerLocal<Arena<'tcx>>,
interner: &ShardedHashMap<&'tcx T, ()>,
) -> Self {
let inner = interner.intern(value, |v| arena.alloc(v));
Uniq(inner)
}
pub fn from_make<K, F: FnOnce(K) -> T>(
key: K,
make: F,
arena: &'tcx WorkerLocal<Arena<'tcx>>,
interner: &ShardedHashMap<&'tcx T, ()>,
) -> Self {
let inner = interner.intern(key, move |key| arena.alloc(make(key)));
Uniq(inner)
}
I'll reserve some time early next week to take a closer look. I think it's great to try and make the interning infrastructure more robust against accidentally violating invariants! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a great idea!
src/librustdoc/clean/types.rs
Outdated
@@ -380,7 +380,7 @@ crate fn rustc_span(def_id: DefId, tcx: TyCtxt<'_>) -> Span { | |||
} | |||
|
|||
impl Item { | |||
crate fn stability<'tcx>(&self, tcx: TyCtxt<'tcx>) -> Option<&'tcx Stability> { | |||
crate fn stability<'tcx>(&self, tcx: TyCtxt<'tcx>) -> Option<Uniq<'tcx, Stability>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the function below dereferences the pointer. I didn't realize it was interned, so I thought it was unnecessary to return a reference and had @GuillaumeGomez remove the reference. Maybe convert it back to a reference (with Uniq
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand this comment. Can you elaborate, or point me at the earlier PR? Thanks.
|
||
impl<'tcx> TyCtxt<'tcx> { | ||
pub fn $method(self, v: $ty) -> Uniq<'tcx, $ty> { | ||
let v = self.interners.$name.intern(v, |v| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should intern
just return Uniq
directly?
Lots of good comments above, this is clearly an idea whose time has come :) I will do more on this next week. |
☔ The latest upstream changes (presumably #93245) made this pull request unmergeable. Please resolve the merge conflicts. |
Yes, this definitely goes in the right direction! I'd even go a step further and try to hide away all the implementation details of interning in a single, generic implementation in
struct InternedSet<'arena, T: Hash + Eq> {
inner: ShardedHashMap<&'arena T, ()>
}
impl InternedSet<'arena, T: Hash + Eq> {
pub fn intern<Q>(&self, value: Q, move_to_arena: impl FnOnce(Q) -> &'arena T) -> Uniq<'arena, T> {
// ...
}
} It's entirely possible that I'm overlooking some complications here -- but that's roughly what I had in mind for cleaning things up. |
Yes, it sounds like a good idea to add the new interning infrastructure in parallel to the old one and then incrementally move things over, until the old infrastructure can be removed. But trying to move the most complicated case as soon as possible, sounds like a good idea too. After that it should be easy I have a hunch that all of this looks a lot more complicated than it actually is, just because it has grown organically over the years. |
All that makes sense. The latest update doesn't do any of that yet, though. Instead I started again from scratch with I changed The commits have lots of changes like this:
and this:
I.e. all the functionality is moved off Anyway, this is where I'm up to. As usual, comments are welcome! Thanks. |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit d75c38ea9a96f53a9eae88b29c335cd28e815f51 with merge 43ea0bf0847986569308356f1623c124b2a46a28... |
This comment has been minimized.
This comment has been minimized.
Interesting! What are the concrete consequences of that? |
By the way, a cross cutting change like this will probably need a major change proposal. |
Probably the main one is that Other than that it's not a big difference, I don't think. Switching |
📌 Commit 80632de has been approved by |
☀️ Test successful - checks-actions |
Finished benchmarking commit (5569757): comparison url. Summary: This benchmark run shows 198 relevant improvements 🎉 but 12 relevant regressions 😿 to instruction counts.
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression |
/// A reference to a value that is interned, and is known to be unique. | ||
/// | ||
/// Note that it is possible to have a `T` and a `Interned<T>` that are (or | ||
/// refer to) equal but different values. But if you have two different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the last word on this line be "equal"? Or I am really confused.
Also "equal but different" sounds rather odd. The key is the location in memory, but that is only first mentioned later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
Note that it is possible to have a
T
and aInterned<T>
to be equal but that are separate values at different addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes more sense. I would probably use &T
since that corresponds better with Interned<T>
(and talking about addresses only makes sense with references anyway).
Update rustc to 3/6 nightly. The only breaking change comes from rust-lang/rust#93148.
Update rustc to 3/6 nightly. The only breaking change comes from rust-lang/rust#93148, which a warning from rust-lang/cargo#10245.
A number of types are interned and
eq
andhash
are implemented onthe pointer rather than the contents. But this is not well enforced
within the type system like you might expect.
This PR introduces a new type
Interned
which encapsulates this conceptmore rigorously, and uses it to convert a couple of the less common
interned types.
r? @fee1-dead