API for supporting tagged references #626

wks · 2022-07-19T04:07:34Z

TL;DR: To support edge enqueuing for VMs that use tagged reference, mmtk-core must also be aware of the fact that ObjectReference loaded from an edge may not be the address of an object, but may be a non-pointer value which must not be traced.

TODO list:

Provide an API so that the VM can identify if an ObjectReference is actually a pointer to an object, and, if it is, filter out all the tag bits and leave only pointer bits.
Update ProcessEdgesWork and other tracing routines so that it doesn't trace the object (trace_object) unless it is actually a pointer.

Tagged references

Some VMs, such as Ruby and V8, uses tagged references (a.k.a. tagged pointers). It borrows some bits from a pointer to denote whether it is actually a pointer, or whether it is a value (such as small integers).

For example, object references need to be aligned (to 4 or 8 bytes depending on architecture or VM requirement). Therefore, we can borrow the lowest bit as a tag. We define that if the lowest bit is 0, it is a reference; if the lowest bit is 1, then the rest of the bits represents an integer. So

00001001 10010101 11101011 11110000 represents a pointer.
00000000 00000000 00000000 00110101 represents a small integer, and its value is 0b11010, which is 26 in decimal.

Tagged References and Edge Enqueuing

MMTk uses edge enqueuing. It enqueues the locations of reference fields in to work packets to be processed later. When processing each edge, it

Load the ObjectReference from the edge
Call trace_object
Optionally store forwarded reference back to the edge

MMTk doesn't know the value of the ObjectReference until finishing step 1. Before executing step 2, it needs to make sure it is actually a pointer to an object. mmtk-core cannot do this alone. It must ask the VM to do this.

Proposed API

#[inline(always)]
fn decode_tagged_reference(tls: VMWorkerThread, object: ObjectReference) -> Option<ObjectReference> {
    (!object.is_null()).then(|| Some(object))
}

The decode_tagged_reference shall return Some(objref) if object holds a reference to an object, and the objref in the returned Some(objref) should be suitable for trace_object and scan_object. It shall return None if it doesn't hold a reference to an object.

The code I proposed consider null pointer as a special case of tagged pointer that doesn't hold a reference to an object. We can discuss whether we should let the VM do null testing.

I am not sure what the requirements of the returned objref are. We usually assume it is the address of the object, maybe with some offset. But does it have to be within a certain range from the beginning of the object? MMTk only does trace_object and calls scan_object. It is always the VM that loads from the object. Does it matter if the returned objref in Some(objref) still contains tag bits at high-order bits or low-order bits?

I am not sure in which trait the decode_tagged_reference function should be defined. Scanning and ObjectModel both seem appropriat.

The text was updated successfully, but these errors were encountered:

qinsoon · 2022-07-19T04:48:30Z

Will #573 solve this issue as well?

wks · 2022-07-19T08:17:28Z

@qinsoon Maybe. I am not sure.

Yes if we consider a slot to "use only some bits (not the whole word) to hold a reference". In this way, when we load from a slot, we clear the tag bits from the loaded word, and return the address of the object as a ObjectReference. When we store an ObjectReference back, we preserve the tag bits, and store only the address bits. One important change may be that the actions on an edge are no longer load and store, but load and update, where the semantics of update includes preserving the bits that aren't changed instead of overwriting the entire word.

This model works well for STW GC, or concurrent non-copying GC. Suppose Ruby uses a concurrent copying GC. A field may hold a reference at one moment, but can be updated by a concurrent mutator to hold a Fixnum (small integer) at the next moment, and the GC must be careful not to overwrite the value into a reference. But concurrent copying GC usually has some kind of replication mechanism (e.g. having two versions of an object at the same time during copying), so that the GC doesn't race with mutators.

I need to think more carefully about this.

wks · 2022-07-19T10:42:06Z

We discussed this in today's meeting. We can load from the edge and check if contains a reference, before we enqueue the edge.

The point is, when we are scanning an object, the object body is in the cache, so loading from it should be fast. By doing this, we only enqueue edges that actually contain object references, instead of blindly enqueuing edges and later find that most edges don't contain references.

I once thought about a similar problem in #584. That issue was about an opportunity to optimise copying GC. It has one thing in common with this issue, that is, we should load from the edges the moment we scan an object.

udesou closed this as completed Nov 13, 2023

This was referenced Nov 30, 2023

Non-null slots (Edge) that don't contain object references, either. #1031

Closed

Proposal: Edge::update, a single method to read-modify-write a slot (Edge) #1033

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API for supporting tagged references #626

API for supporting tagged references #626

wks commented Jul 19, 2022

qinsoon commented Jul 19, 2022

wks commented Jul 19, 2022

wks commented Jul 19, 2022

API for supporting tagged references #626

API for supporting tagged references #626

Comments

wks commented Jul 19, 2022

Tagged references

Tagged References and Edge Enqueuing

Proposed API

qinsoon commented Jul 19, 2022

wks commented Jul 19, 2022

wks commented Jul 19, 2022