Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attachment implementation to return value from visitor methods #21

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

NathanQingyangXu
Copy link
Contributor

@NathanQingyangXu NathanQingyangXu commented Dec 3, 2024

https://jira.mongodb.org/browse/HIBERNATE-32

try to simulate the similar stuff in POC project.

Copy link
Collaborator

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick initial feedback.

}

@VisibleForTesting(otherwise = PRIVATE)
final Deque<Attachment<?>> state = new ArrayDeque<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the Deque until we prove that it's actually needed. The design doc doesn't mention it, and it wasn't needed in the POC (which covered a good bit of the query API)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I overthought. The POC implementation is good enough and it covers everything already.

@stIncMale stIncMale self-requested a review December 4, 2024 23:01
Copy link
Member

@stIncMale stIncMale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expressing a few concerns without having reviewed the whole PR.

*/
public interface AttachmentKey<T> {
AttachmentKey<String> COLUMN_NAME = new AttachmentKey<>() {};
AttachmentKey<String> COLLECTION_NAME = new AttachmentKey<>() {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toString of such objects does not seem to be descriptive, yet it is used in Attachment.expect (
throw new IllegalArgumentException(format("No value attached for attachment key: %s", attachmentKey))). Let's improve that.

Copy link
Contributor Author

@NathanQingyangXu NathanQingyangXu Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Ideally the output not only includes key name, but also the type info.

Given the keys will be added incrementally, it would be great to make the addition as easy as possible (without manual labour overhead).

I come up with the following idea to output both generic type info and names using Java reflection:

    default String render() {
        return Arrays.stream(AttachmentKey.class.getDeclaredFields())
                .filter(field -> {
                    try {
                        return field.get(null) == this;
                    } catch (IllegalAccessException e) {
                        throw new RuntimeException(e);
                    }
                })
                .findFirst()
                .map(field -> format("%s: %s", field.getGenericType(), field.getName()))
                .orElse(toString());
    }

so for the following new key constant:

AttachmentKey<List<Map<String, Object>>> NO_NAME = new AttachmentKey<>() {};

the above method will output the following info:

AttachmentKey<java.util.List<java.util.Map<java.lang.String, java.lang.Object>>>: NO_NAME

Not sure whether it is over-engineering or not.

In normal visitor usage scenario, usually type matching is enough. It is rare to return multiple different result from the same visitxxx(..) method (by conditional branching) for if that is the case, a better design is to split it into multiple different visitor methods instead.

* @see AttachmentKey
*/
@SuppressWarnings("unchecked")
public final class Attachment {
Copy link
Member

@stIncMale stIncMale Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think, the "attachment" term is a misnomer here. It came from RetryState.attachment, but here the idea is very different, despite the implementation being similar. Attachment exists to allow yielding a value from a void method in a situation when we can't make it non-void. The fact that this is implemented by writing that value into the instance field of Attachment (which one may see as "attaching") is not of importance, certainly not on importance to the extent that it dictates the naming.
  2. This mechanism, allowing us to yield values from void methods, is not coupled with the SQL AST walker, so I propose us to remove the story about the walker, which is essentially the whole class-level documentation of Attachment.

Copy link
Contributor Author

@NathanQingyangXu NathanQingyangXu Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I read the simulated stuff in Java driver (mostly written by you), I agree Attachment is inappropriate here. It was used for storing retry state related common states. In our usage, we have different requirements, including:

  1. we store one single value for the visitor method returning void in question; attachment is barely feasible in the sense that we did store value somewhere
  2. A more subtle requirement to restore previous state after finishing the current visitor method to accommodate tree depth-first traversal, which is the salient feature and we should make it explicit in doc; it is based on a clever trick that we spread ancestor states within the various method calling stack trace corresponding to the visitor methods of its various ancestors.

import org.jspecify.annotations.Nullable;

/**
* Manages global state while translating SQL AST via {@link org.hibernate.sql.ast.SqlAstWalker}.
Copy link
Member

@stIncMale stIncMale Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Manages global state" - I don't think that is correct. This utility provides a way to yield values from void methods. It manages only its own object-local state, just like any other stateful object, and this does not require any special mentions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree with this, given the existence of fields like AttachmentKey#COLUMN_NAME. We could make it true though by moving those static AttachmentKey instances into a separate class that is more closely tied to the AST walker.

*/

@NullMarked
package com.mongodb.hibernate.translate.attachment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this package part of our API? If not, then let's make it internal.

The same question is applicable to com.mongodb.hibernate.translate.

* @param <T> the expected Java generic type
*/
public <T> T expect(AttachmentKey<T> attachmentKey, Runnable attacher) {
var previousAttachmentKey = this.attachmentKey;
Copy link
Member

@stIncMale stIncMale Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it true that previousAttachmentKey is always null, provided that Attachment.expect is not called concurrently with itself (note that a such a thing may happen even within the same thread of execution)?

I suspect, that all we need is to assert here that this.attachmentKey is null, and we don't need previousAttachmentKey.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, previouslyAttachmentKey could be anything. We use it to cover such common AST visiting pattern: one visitor method calls another one, which calls another one, ..... AST tree depth could be ad-hoc, sub query is a good example.

That is why I originally used Stack to record the tree traversal (post-order for we attach value bottom-up), but later on I realize Jeff's implementation has covered the case of backtracking by reverting back previous states.

So suppose in a deeply nested visitor method calling, all the "previous" state will be stored in their corresponding statck trace local variables. This seems an elegant solution (though it takes some getting used to and deep understanding, which could confuse new coder invariably).

* {@code null}
* @param <T> the expected Java generic type
*/
public <T> T expect(AttachmentKey<T> attachmentKey, Runnable attacher) {
Copy link
Member

@stIncMale stIncMale Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that neither the method name nor the parameter names here express what they are. This method executes a Runnable action, and allows that action to yield a value, which is returned by this method. None of this is expressed by the expect name of the method, and attacher does not express that it's the action we are executing.

Following is an example of an alternative naming we may consider:

SideChannel {
    <T> T execute(ValueDescriptor<T> valueDescriptor, Runnable actionYieldingValue);
    
    void yield(ValueDescriptor<T> valueDescriptor, T value)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the naming was by Jeff in POC project. It is challenging to come up with perfect naming, tbh. In tech doc you mentioned ReturnType in lieu of AttachmentKey.

@jyemin , I think his naming seems a little bit better

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have zero attachment (pun intended) to the names used in the POC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that this interface was introduced only to be mentioned in the story written at the class level of Attachment. If we end up deleting that story as suggested in https://github.com/mongodb/mongo-hibernate/pull/21/files#r1870405938, we should also remove this interface from this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it could and should be deleted.

}

/**
* Detaches the attached value by invoking some visitor method which returns {@code void} but internally the method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "detaches" part (this.attachedValue = null) is conceptually an implementation detail, it cannot in principle be part of the method contract because it is not an observable behavior, provided that the expect methods is not called concurrently with itself. Thus, it should not be mentioned or even alluded to in the documentation.

}

@VisibleForTesting(otherwise = VisibleForTesting.AccessModifier.PRIVATE)
boolean isBlank() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this method, even for tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A key feature of Attachment is to support embedded attaching (there is a unit testing case to demonstrate that). I used this method to verify that after usage, its initial blank state should be reverted back. That covers the case that some dangling data lingers, e.g.:

  • Runnalbe throws an exception and our Attachment didn't restore previous state;
  • deeply nested attachting didn't work as expected due to defective "previous state restoration"

tbh, all these Attachment complexity is really difficult to understand and use. Without a perfect unit testing coverage, it is so error-prone.

Copy link
Collaborator

@jyemin jyemin Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, all these Attachment complexity is really difficult to understand and use.

Do we have a better alternative though?

Copy link
Contributor Author

@NathanQingyangXu NathanQingyangXu Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have an obvious choice here. A common issue with common solution. E.g. see https://stackoverflow.com/questions/1711797/how-to-store-state-on-a-node-for-a-visitor-pattern

It is also explained in detail in one book I read about ANTLR library (https://pragprog.com/titles/tpantlr2/the-definitive-antlr-4-reference/). Ultimately the reason Hibernate uses such visitor pattern is simply due to ANTLR library (which generates visitor API as the first vistior API in Hibernate; Hibernate simulates it to accommodate its SQL translation step. Actually Hibernate uses such visitor walker more than once, including:

  1. SQM AST -> SQL AST (e.g. https://github.com/hibernate/hibernate-orm/blob/main/hibernate-core/src/main/java/org/hibernate/query/sqm/sql/BaseSqmToSqlAstConverter.java). Return type is used for the various visitor methods for Hibernate has internal SQL AST model. Below is the method to convert SqmUpdateStatement to SqmUpdateStatement:
Screenshot 2024-12-09 at 2 36 33 PM
  1. SQL AST -> SQL: given the target is flat SQL string, so it is natural Hibernate opts for void visitor methods, which is not enough for us for we have intermediate mongo AST

I explored whether we could make the above second item to return type as in SQM AST -> SQL AST case, but it has long-term maintainability issue so we'd better adhere to Hibernate's existing walker API without customization.

The most natural way to solve the issue is as follows:

  • maintain a global map from SQL AST tree node reference to its corresponding visitor method's returned value; ANTLR book even provides the detail to use IdentityHashMap instead of HashMap for opt.
  • when we visit an AST node in its visit method, we adopt post-order traversal by visiting its children nodes first, then we look up in the global map to get their values, then we finish computing the returned value for the current node and store it into global map as well.

There is still complexity like ensuring type safety but I want to say it enjoys the biggest benefit: easy understanding and easy maintaining. Suppose I am community contributor, I would feel easy to get the hang of the subtlety without firstly scratching my hairs to figure out the complicated Attachment mechanism (including its subtle old state restoration detail, which relies on stacktrace local variables to play hand in hand with method calling recursion or depth-first tree travseral).

return (T) this.attachedValue;
} finally {
// revert back old state
this.attachmentKey = previousAttachmentKey;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, this is equivalent to this.attachmentKey = null, and is needed only to allow us to use an instance of Attachment for executing more than one Runnable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as explained previously, it is used for visiting AST tree with ad-hoc depth. We need to restore the previous ancestor node state for we only have one single flat field to store the 'current state'.

}
if (this.attachedValue != null) {
throw new IllegalStateException(format(
"Current attachment with key [%s] has attached value [%s] already when another new value is provided [%s]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Attachment is not part of API, which I suspect is true, all the checks in Attachment are assertions. So if this is not part of API, let's make them assertions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitively not part of API

Copy link
Member

@stIncMale stIncMale Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking more about this API, I came up with the following proposal: YieldingRunnable (the full commit is stIncMale@0bdc3ac).

The usage looks like this:

String result = YieldingRunnable.run(ValueDescriptors.COLLECTION_NAME, yieldableValue -> {
    voidMethod();
    yieldableValue.yield(ValueDescriptors.COLLECTION_NAME, "my string");
});

See tests for the description of the features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look. Seems the ValueDescriptors is pretty good. But the yieldableValue might not be very useful for the yielding is supposed to be happening within the voidMethod() or visitor method, that is the key challenge here. We have to yield value without returning from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could grab the ValueDescriptor from the storage per se within the void visitor method, but that is not the design. The design here is to provide such AttachmentKey when some value is attached for proactive key validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants