Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (graphql): Datahub GMS Graphql Api Application for Querying Dataset #2071

Merged
merged 11 commits into from
Feb 1, 2021

Conversation

arunvasudevan
Copy link
Collaborator

Datahub GMS GraphQL API service adds a /graphql endpoint to the gms service.

Here is the highlevel overview of the Architecture

Datahub GMS GraphQL API

To verify the changes in a docker use the below command
docker-compose -p datahub -f docker-compose.yml up datahub-gms-graphql-api

Endpoint: http://localhost:8091/graphql

GraphQL Query:

{
  dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)") {
    urn
    platform
    name
    origin
    description
    platformNativeType
    uri
    tags
    ownership {
      owners {
        owner {
          username
          urn
          info {
            displayName
            email
            fullName
            manager {
              urn
            }
          }
          editableInfo {
            aboutMe
            skills
          }
        }
        type
        source {
          url
        }
      }
      lastModified {
        actor
      }
    }
    created {
      actor
    }
    lastModified {
      actor
    }
  }
}

Checklist

  • [ X] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • [ X] Tests for the changes have been added/updated (if applicable)
  • [ X] Docs related to the changes have been added/updated (if applicable)

@arunvasudevan arunvasudevan changed the title feat: Datahub GMS Graphql Api feat (graphql): Datahub GMS Graphql Api Application for Querying Dataset Jan 28, 2021
Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is looking really good! A few comments.

build.gradle Outdated
@@ -56,6 +56,8 @@ project.ext.externalDependency = [
'gmaTestModelsDataTemplate': "com.linkedin.datahub-gma:test-models-data-template:$gmaVersion",
'gmaValidators': "com.linkedin.datahub-gma:validators:$gmaVersion",
'graphqlJava': 'com.graphql-java:graphql-java:16.1',
'graphqlJavaSpringWebMvc': 'com.graphql-java:graphql-java-spring-boot-starter-webmvc:2.0',
'graphiqlSpring': 'com.graphql-java:graphiql-spring-boot-starter:5.0.2',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used for graph visualization, correct? Is this necessary to include in DH?

compile externalDependency.antlr4
compileOnly externalDependency.lombok
annotationProcessor externalDependency.lombok
compile externalDependency.graphqlJavaSpringWebMvc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor organization nitpick: Place these compile statements above the annotationProcessor and compileOnly statements


@SuppressWarnings("checkstyle:HideUtilityClassConstructor")
@SpringBootApplication(exclude = {ElasticsearchAutoConfiguration.class, RestClientAutoConfiguration.class})
public class DatahubGmsGraphQLApiApplication {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Are 'Daahub' & 'Api' necessary as part of the name here?

Consider GmsGraphQLApplication or something similar

Copy link
Collaborator Author

@arunvasudevan arunvasudevan Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always naming is one of the hard part, thanks for providing suggestions here...

}

@Bean
public GraphQLEngine graphQLEngine() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to explicitly provide the GraphQLEngine in addition to the GraphQL instance, or is theGraphQL instance enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsGraphQLEngine is used by the class QueryGraphQLInvocation it needs to be provided to be Autowired in the Invocation class.
Let me know if there is a better way to do this.

.dataLoaderRegistry(register)
.context(queryContext)
.build();
return graphQL.executeAsync(executionInput);
Copy link
Collaborator

@jjoyce0510 jjoyce0510 Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we unable to leverage the execute API that exists within GraphQLEngine? If not, why not?

It seems that GraphQLEngine is only being used here to help initialize the DataLoaders, is that the correct interpretation?

Copy link
Collaborator Author

@arunvasudevan arunvasudevan Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As shown in this example https://www.graphql-java.com/tutorials/getting-started-with-spring-boot/#schema graphql-java-spring-boot-starter-webmvc dependency works with the GraphQL bean.

Main reason to use the GraphQLEngine was to override the default GraphQLContext with QueryContext thereby supplying the dataProviders.

If we have to use the GraphQLEngine execute In my opinion we will have to implement the low level details which is taken care for us by graphql-java-spring-boot-starter-webmvc (https://github.com/graphql-java/graphql-java-spring/blob/master/graphql-java-spring-webmvc/src/main/java/graphql/spring/web/servlet/components/GraphQLController.java)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant to suggest is... Why not do something like the following instead:

return graphQLEngine.execute(invocationData.getQuery(), invocationData.getVariables(), queryContext); 

If you did this, you likely wouldn't need both GraphQL and GraphQLEngine in this class, just the latter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

@@ -31,6 +33,7 @@
*
* <p>In addition, it provides a simplified 'execute' API that accepts a 1) query string and 2) set of variables.
*/
@Data
Copy link
Collaborator

@jjoyce0510 jjoyce0510 Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't make this a @DaTa class. I'd prefer everything that we expose from here to be explicit.

Instead, we can manually add getter methods for the information you need to retrieve

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense..Instead I can just @Getter. Intention is just to reduce the noise here.

@@ -152,7 +155,7 @@ public GraphQLEngine build() {
}
}

private DataLoaderRegistry createDataLoaderRegistry(final Map<String, Supplier<DataLoader<?, ?>>> dataLoaderSuppliers) {
public DataLoaderRegistry createDataLoaderRegistry(final Map<String, Supplier<DataLoader<?, ?>>> dataLoaderSuppliers) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to expose a method called getDataLoaderRegistry that returns a cached version of a registry that was already created?

That way, the caller does not need to explicitly pass in the dataLoaderSuppliers

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree! 👍

@@ -3,7 +3,7 @@ scalar Long

schema {
query: Query
mutation: Mutation
# mutation: Mutation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need to comment this out? It will break the datahub-frontend server (which extends the Mutation type) if we do

Copy link
Collaborator Author

@arunvasudevan arunvasudevan Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently as the mutation type has no fields, so I get a compilation error.

nested exception is graphql.schema.validation.InvalidSchemaException: invalid schema:
"Mutation" must define one or more fields.

Once we start adding fields for the mutation it should be good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, okay

settings.gradle Outdated
include 'metadata-utils'
include 'datahub-gms-graphql-api'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider alternative names for this module? When I read 'api' in a Java project I tend to think of a module containing public interfaces / the publicly accessible portions of a library.

Basically we should try to capture the idea that the module contains a deployable service using the name.

Alternatives could be

  • datahub-gms-graphql-spring-boot
  • datahub-gms-graphql-service
  • datahub-gms-graphql-server

Or something similar. Interested to hear your thoughts!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datahub-gms-graphql-service makes sense to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought considering to rename it as just datahub-gms-graphql. Any thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned people will wonder what the difference is between datahub-gms-graphql and datahub-gms-graphql-core if there is no additional qualifier.

I think it's important to include something in the name suggesting "this is just a graphql-compatible deployable"

@mars-lan
Copy link
Contributor

@jjoyce0510 Is the goal to migrate the new React FE to using this instead of the Play mid-tier graphql endpoint? Maintaining 2 stacks would be difficult.

@arunvasudevan
Copy link
Collaborator Author

@jjoyce0510 Resolved most of the comments let me know what you think.

@Override
public CompletableFuture<ExecutionResult> invoke(GraphQLInvocationData invocationData, WebRequest webRequest) {
QueryContext queryContext = new SpringQueryContext(true, APPNAME);
DataLoaderRegistry register = graphQLEngine.createDataLoaderRegistry(graphQLEngine.get_dataLoaderSuppliers());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: registry

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are still looking at the outdated code, this has been updated in the last commit.


@Override
public CompletableFuture<ExecutionResult> invoke(GraphQLInvocationData invocationData, WebRequest webRequest) {
QueryContext queryContext = new SpringQueryContext(true, APPNAME);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are passing in APPNAME here... Soon, we'll have to understand how to authorize specific actions (say, mutating the ownership of a Dataset) against a concrete human actor (like a corpUser) at some point. To do so, we'll want to have the context of the actual actor (eg. which user is making the change) to perform authorization against.

Perhaps we can come up with some standardized "super" actor that we will always let through, and if you really cannot pass the actual actor context you can use that for your use case

This may mean that in the future you may need access to that context.

@jjoyce0510
Copy link
Collaborator

jjoyce0510 commented Jan 29, 2021

Is the goal to migrate the new React FE to using this instead of the Play mid-tier graphql endpoint? Maintaining 2 stacks would be difficult.

@mars-lan That's not currently part of the goals. Both our use cases will share the same base graph schema & business logic required for resolving entities from GMS, but the final deployable layer will remain separate for now. The fact that we are sharing the bulk of the code should reduce the cost of maintaining separate deployment containers.

I think we can continue to evaluate whether it makes sense to consolidate around what Arun's working on, but we'd need to port some additional business logic from the existing datahub-frontend pertaining to authentication (the login module, auth cookie creation, etc), which @arunvasudevan has made clear he does not need for his use case.

TLDR; Because we are sharing the graphql core code maintenance cost should be small. It's not out of the question in the future, but it's not in the current plan.

@@ -33,18 +34,20 @@
*
* <p>In addition, it provides a simplified 'execute' API that accepts a 1) query string and 2) set of variables.
*/
@Data
@Getter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just implement the getter methods manually, instead of using lombok? In the future if we added a new private final member variable we'd automatically generate the getter, which may not be what we want (eg. if we want to hide internal impl details).


_dataLoaderSuppliers = dataLoaderSuppliers;

_registry = registry;
Copy link
Collaborator

@jjoyce0510 jjoyce0510 Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding this new private variable?

The registry should be re-created on each request. From the graphql-java official documentation https://www.graphql-java.com/documentation/v15/batching/ ( see "Per Request Data Loaders" section )

    // DataLoaderRegistry is a place to register all data loaders in that needs to be dispatched together
    // in this case there is 1 but you can have many.
    //
    // Also note that the data loaders are created per execution request
    //
    DataLoaderRegistry registry = new DataLoaderRegistry();
    registry.register("character", characterDataLoader);

    ExecutionInput executionInput = newExecutionInput()
            .query(getQuery())
            .dataLoaderRegistry(registry)
            .build();

For that reason, I don't include it on creation of the GraphQLEngine, but instead recreate for each request using the dataLoaderSuppliers.

/*
* Init DataLoaderRegistry - should be created for each request.
*/
DataLoaderRegistry register = createDataLoaderRegistry(_dataLoaderSuppliers);
Copy link
Collaborator

@jjoyce0510 jjoyce0510 Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be left as is, with the registry* (typo) created on each execution

private GraphQLEngine graphQLEngine;

@Bean
public GraphQL graphQL() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed now, as it is unused, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As shown here - https://www.graphql-java.com/tutorials/getting-started-with-spring-boot/#schema
graphQL bean is needed by the graphql-java-spring-boot-starter-webmvc dependency.

@@ -34,20 +32,16 @@
*
* <p>In addition, it provides a simplified 'execute' API that accepts a 1) query string and 2) set of variables.
*/
@Getter
public class GraphQLEngine {

private final GraphQL _engine;
Copy link
Collaborator

@jjoyce0510 jjoyce0510 Jan 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually misnamed (by me)... We can change it to graphQL

Then the getter can become getGraphQL :)

@jjoyce0510
Copy link
Collaborator

lgtm! Thanks for your contribution Arun!

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all the feedback @arunvasudevan! This LGTM.

@shirshanka shirshanka merged commit 84e952e into datahub-project:master Feb 1, 2021
@arunvasudevan arunvasudevan deleted the gmsGraphQLApi branch March 4, 2022 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants