Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for supporting tagged references #626

Closed
2 tasks
wks opened this issue Jul 19, 2022 · 3 comments
Closed
2 tasks

API for supporting tagged references #626

wks opened this issue Jul 19, 2022 · 3 comments

Comments

@wks
Copy link
Collaborator

wks commented Jul 19, 2022

TL;DR: To support edge enqueuing for VMs that use tagged reference, mmtk-core must also be aware of the fact that ObjectReference loaded from an edge may not be the address of an object, but may be a non-pointer value which must not be traced.

TODO list:

  • Provide an API so that the VM can identify if an ObjectReference is actually a pointer to an object, and, if it is, filter out all the tag bits and leave only pointer bits.
  • Update ProcessEdgesWork and other tracing routines so that it doesn't trace the object (trace_object) unless it is actually a pointer.

Tagged references

Some VMs, such as Ruby and V8, uses tagged references (a.k.a. tagged pointers). It borrows some bits from a pointer to denote whether it is actually a pointer, or whether it is a value (such as small integers).

For example, object references need to be aligned (to 4 or 8 bytes depending on architecture or VM requirement). Therefore, we can borrow the lowest bit as a tag. We define that if the lowest bit is 0, it is a reference; if the lowest bit is 1, then the rest of the bits represents an integer. So

  • 00001001 10010101 11101011 11110000 represents a pointer.
  • 00000000 00000000 00000000 00110101 represents a small integer, and its value is 0b11010, which is 26 in decimal.

Tagged References and Edge Enqueuing

MMTk uses edge enqueuing. It enqueues the locations of reference fields in to work packets to be processed later. When processing each edge, it

  1. Load the ObjectReference from the edge
  2. Call trace_object
  3. Optionally store forwarded reference back to the edge

MMTk doesn't know the value of the ObjectReference until finishing step 1. Before executing step 2, it needs to make sure it is actually a pointer to an object. mmtk-core cannot do this alone. It must ask the VM to do this.

Proposed API

#[inline(always)]
fn decode_tagged_reference(tls: VMWorkerThread, object: ObjectReference) -> Option<ObjectReference> {
    (!object.is_null()).then(|| Some(object))
}

The decode_tagged_reference shall return Some(objref) if object holds a reference to an object, and the objref in the returned Some(objref) should be suitable for trace_object and scan_object. It shall return None if it doesn't hold a reference to an object.

The code I proposed consider null pointer as a special case of tagged pointer that doesn't hold a reference to an object. We can discuss whether we should let the VM do null testing.

I am not sure what the requirements of the returned objref are. We usually assume it is the address of the object, maybe with some offset. But does it have to be within a certain range from the beginning of the object? MMTk only does trace_object and calls scan_object. It is always the VM that loads from the object. Does it matter if the returned objref in Some(objref) still contains tag bits at high-order bits or low-order bits?

I am not sure in which trait the decode_tagged_reference function should be defined. Scanning and ObjectModel both seem appropriat.

@qinsoon
Copy link
Member

qinsoon commented Jul 19, 2022

Will #573 solve this issue as well?

@wks
Copy link
Collaborator Author

wks commented Jul 19, 2022

@qinsoon Maybe. I am not sure.

Yes if we consider a slot to "use only some bits (not the whole word) to hold a reference". In this way, when we load from a slot, we clear the tag bits from the loaded word, and return the address of the object as a ObjectReference. When we store an ObjectReference back, we preserve the tag bits, and store only the address bits. One important change may be that the actions on an edge are no longer load and store, but load and update, where the semantics of update includes preserving the bits that aren't changed instead of overwriting the entire word.

This model works well for STW GC, or concurrent non-copying GC. Suppose Ruby uses a concurrent copying GC. A field may hold a reference at one moment, but can be updated by a concurrent mutator to hold a Fixnum (small integer) at the next moment, and the GC must be careful not to overwrite the value into a reference. But concurrent copying GC usually has some kind of replication mechanism (e.g. having two versions of an object at the same time during copying), so that the GC doesn't race with mutators.

I need to think more carefully about this.

@wks
Copy link
Collaborator Author

wks commented Jul 19, 2022

We discussed this in today's meeting. We can load from the edge and check if contains a reference, before we enqueue the edge.

The point is, when we are scanning an object, the object body is in the cache, so loading from it should be fast. By doing this, we only enqueue edges that actually contain object references, instead of blindly enqueuing edges and later find that most edges don't contain references.

I once thought about a similar problem in #584. That issue was about an opportunity to optimise copying GC. It has one thing in common with this issue, that is, we should load from the edges the moment we scan an object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants