-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat (graphql): Datahub GMS Graphql Api Application for Querying Dataset #2071
feat (graphql): Datahub GMS Graphql Api Application for Querying Dataset #2071
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is looking really good! A few comments.
build.gradle
Outdated
@@ -56,6 +56,8 @@ project.ext.externalDependency = [ | |||
'gmaTestModelsDataTemplate': "com.linkedin.datahub-gma:test-models-data-template:$gmaVersion", | |||
'gmaValidators': "com.linkedin.datahub-gma:validators:$gmaVersion", | |||
'graphqlJava': 'com.graphql-java:graphql-java:16.1', | |||
'graphqlJavaSpringWebMvc': 'com.graphql-java:graphql-java-spring-boot-starter-webmvc:2.0', | |||
'graphiqlSpring': 'com.graphql-java:graphiql-spring-boot-starter:5.0.2', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used for graph visualization, correct? Is this necessary to include in DH?
datahub-gms-graphql-api/build.gradle
Outdated
compile externalDependency.antlr4 | ||
compileOnly externalDependency.lombok | ||
annotationProcessor externalDependency.lombok | ||
compile externalDependency.graphqlJavaSpringWebMvc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor organization nitpick: Place these compile statements above the annotationProcessor
and compileOnly
statements
|
||
@SuppressWarnings("checkstyle:HideUtilityClassConstructor") | ||
@SpringBootApplication(exclude = {ElasticsearchAutoConfiguration.class, RestClientAutoConfiguration.class}) | ||
public class DatahubGmsGraphQLApiApplication { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Are 'Daahub' & 'Api' necessary as part of the name here?
Consider GmsGraphQLApplication
or something similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As always naming is one of the hard part, thanks for providing suggestions here...
} | ||
|
||
@Bean | ||
public GraphQLEngine graphQLEngine() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to explicitly provide the GraphQLEngine
in addition to the GraphQL
instance, or is theGraphQL
instance enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AsGraphQLEngine
is used by the class QueryGraphQLInvocation
it needs to be provided to be Autowired in the Invocation class.
Let me know if there is a better way to do this.
.dataLoaderRegistry(register) | ||
.context(queryContext) | ||
.build(); | ||
return graphQL.executeAsync(executionInput); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we unable to leverage the execute
API that exists within GraphQLEngine
? If not, why not?
It seems that GraphQLEngine
is only being used here to help initialize the DataLoaders, is that the correct interpretation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As shown in this example https://www.graphql-java.com/tutorials/getting-started-with-spring-boot/#schema graphql-java-spring-boot-starter-webmvc
dependency works with the GraphQL
bean.
Main reason to use the GraphQLEngine
was to override the default GraphQLContext
with QueryContext
thereby supplying the dataProviders.
If we have to use the GraphQLEngine
execute In my opinion we will have to implement the low level details which is taken care for us by graphql-java-spring-boot-starter-webmvc
(https://github.com/graphql-java/graphql-java-spring/blob/master/graphql-java-spring-webmvc/src/main/java/graphql/spring/web/servlet/components/GraphQLController.java)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant to suggest is... Why not do something like the following instead:
return graphQLEngine.execute(invocationData.getQuery(), invocationData.getVariables(), queryContext);
If you did this, you likely wouldn't need both GraphQL
and GraphQLEngine
in this class, just the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it!
@@ -31,6 +33,7 @@ | |||
* | |||
* <p>In addition, it provides a simplified 'execute' API that accepts a 1) query string and 2) set of variables. | |||
*/ | |||
@Data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't make this a @DaTa class. I'd prefer everything that we expose from here to be explicit.
Instead, we can manually add getter
methods for the information you need to retrieve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense..Instead I can just @Getter
. Intention is just to reduce the noise here.
@@ -152,7 +155,7 @@ public GraphQLEngine build() { | |||
} | |||
} | |||
|
|||
private DataLoaderRegistry createDataLoaderRegistry(final Map<String, Supplier<DataLoader<?, ?>>> dataLoaderSuppliers) { | |||
public DataLoaderRegistry createDataLoaderRegistry(final Map<String, Supplier<DataLoader<?, ?>>> dataLoaderSuppliers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to expose a method called getDataLoaderRegistry
that returns a cached version of a registry that was already created?
That way, the caller does not need to explicitly pass in the dataLoaderSuppliers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree! 👍
@@ -3,7 +3,7 @@ scalar Long | |||
|
|||
schema { | |||
query: Query | |||
mutation: Mutation | |||
# mutation: Mutation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we need to comment this out? It will break the datahub-frontend
server (which extends the Mutation type) if we do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently as the mutation type has no fields, so I get a compilation error.
nested exception is graphql.schema.validation.InvalidSchemaException: invalid schema:
"Mutation" must define one or more fields.
Once we start adding fields for the mutation it should be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it, okay
settings.gradle
Outdated
include 'metadata-utils' | ||
include 'datahub-gms-graphql-api' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we consider alternative names for this module? When I read 'api' in a Java project I tend to think of a module containing public interfaces / the publicly accessible portions of a library.
Basically we should try to capture the idea that the module contains a deployable service using the name.
Alternatives could be
- datahub-gms-graphql-spring-boot
- datahub-gms-graphql-service
- datahub-gms-graphql-server
Or something similar. Interested to hear your thoughts!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datahub-gms-graphql-service
makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought considering to rename it as just datahub-gms-graphql
. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned people will wonder what the difference is between datahub-gms-graphql
and datahub-gms-graphql-core
if there is no additional qualifier.
I think it's important to include something in the name suggesting "this is just a graphql-compatible deployable"
@jjoyce0510 Is the goal to migrate the new React FE to using this instead of the Play mid-tier graphql endpoint? Maintaining 2 stacks would be difficult. |
@jjoyce0510 Resolved most of the comments let me know what you think. |
@Override | ||
public CompletableFuture<ExecutionResult> invoke(GraphQLInvocationData invocationData, WebRequest webRequest) { | ||
QueryContext queryContext = new SpringQueryContext(true, APPNAME); | ||
DataLoaderRegistry register = graphQLEngine.createDataLoaderRegistry(graphQLEngine.get_dataLoaderSuppliers()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: registry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are still looking at the outdated code, this has been updated in the last commit.
|
||
@Override | ||
public CompletableFuture<ExecutionResult> invoke(GraphQLInvocationData invocationData, WebRequest webRequest) { | ||
QueryContext queryContext = new SpringQueryContext(true, APPNAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are passing in APPNAME here... Soon, we'll have to understand how to authorize specific actions (say, mutating the ownership of a Dataset) against a concrete human actor (like a corpUser) at some point. To do so, we'll want to have the context of the actual actor (eg. which user is making the change) to perform authorization against.
Perhaps we can come up with some standardized "super" actor that we will always let through, and if you really cannot pass the actual actor context you can use that for your use case
This may mean that in the future you may need access to that context.
@mars-lan That's not currently part of the goals. Both our use cases will share the same base graph schema & business logic required for resolving entities from GMS, but the final deployable layer will remain separate for now. The fact that we are sharing the bulk of the code should reduce the cost of maintaining separate deployment containers. I think we can continue to evaluate whether it makes sense to consolidate around what Arun's working on, but we'd need to port some additional business logic from the existing TLDR; Because we are sharing the graphql core code maintenance cost should be small. It's not out of the question in the future, but it's not in the current plan. |
@@ -33,18 +34,20 @@ | |||
* | |||
* <p>In addition, it provides a simplified 'execute' API that accepts a 1) query string and 2) set of variables. | |||
*/ | |||
@Data | |||
@Getter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just implement the getter methods manually, instead of using lombok? In the future if we added a new private final
member variable we'd automatically generate the getter, which may not be what we want (eg. if we want to hide internal impl details).
|
||
_dataLoaderSuppliers = dataLoaderSuppliers; | ||
|
||
_registry = registry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we adding this new private variable?
The registry should be re-created on each request. From the graphql-java
official documentation https://www.graphql-java.com/documentation/v15/batching/ ( see "Per Request Data Loaders" section )
// DataLoaderRegistry is a place to register all data loaders in that needs to be dispatched together
// in this case there is 1 but you can have many.
//
// Also note that the data loaders are created per execution request
//
DataLoaderRegistry registry = new DataLoaderRegistry();
registry.register("character", characterDataLoader);
ExecutionInput executionInput = newExecutionInput()
.query(getQuery())
.dataLoaderRegistry(registry)
.build();
For that reason, I don't include it on creation of the GraphQLEngine
, but instead recreate for each request using the dataLoaderSuppliers
.
/* | ||
* Init DataLoaderRegistry - should be created for each request. | ||
*/ | ||
DataLoaderRegistry register = createDataLoaderRegistry(_dataLoaderSuppliers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be left as is, with the registry* (typo) created on each execution
private GraphQLEngine graphQLEngine; | ||
|
||
@Bean | ||
public GraphQL graphQL() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed now, as it is unused, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As shown here - https://www.graphql-java.com/tutorials/getting-started-with-spring-boot/#schema
graphQL
bean is needed by the graphql-java-spring-boot-starter-webmvc
dependency.
@@ -34,20 +32,16 @@ | |||
* | |||
* <p>In addition, it provides a simplified 'execute' API that accepts a 1) query string and 2) set of variables. | |||
*/ | |||
@Getter | |||
public class GraphQLEngine { | |||
|
|||
private final GraphQL _engine; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually misnamed (by me)... We can change it to graphQL
Then the getter can become getGraphQL
:)
lgtm! Thanks for your contribution Arun! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all the feedback @arunvasudevan! This LGTM.
Datahub GMS GraphQL API service adds a /graphql endpoint to the gms service.
Here is the highlevel overview of the Architecture
To verify the changes in a docker use the below command
docker-compose -p datahub -f docker-compose.yml up datahub-gms-graphql-api
Endpoint: http://localhost:8091/graphql
GraphQL Query
:Checklist