-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(datajob): Datajob graphql query #2242
Conversation
private static void configureDataFlowResolvers(final RuntimeWiring.Builder builder) { | ||
builder | ||
.type("Owner", typeWiring -> typeWiring | ||
.dataFetcher("owner", new AuthenticatedResolver<>( | ||
new LoadableTypeResolver<>( | ||
CORP_USER_TYPE, | ||
(env) -> ((Owner) env.getSource()).getOwner().getUrn())) | ||
) | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need this wiring- owner: Owner
type is already configured in dataset's configure-resolvers method and dataflow + datajobs are able to re-use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll remove it!
So we basically don't need a configureDataFlowResolvers
at all then?
I got confused as I was following the configureMlModelResolvers
example, where it was configured as well. Should I remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frsann I would double check using graphiql ui before making that change but I believe that should be fine
cc @jjoyce0510 to confirm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I'll check with the IDE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gabe's correct - we should be able to remove this. We should likely refactor the configureDatasetResolvers
method to extract configuration of the "Owner" type into a dedicated method. For now, feel free to remove it here and with the ML models, I just ask that you issue a test query to verify that we can still traverse to the Owner type!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will do!
datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you have a chance to test this using graphiql? (https://github.com/graphql/graphiql)
This would let you issue graphql queries easily to your endpoint and verify things are wired up correctly.
datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java
Outdated
Show resolved
Hide resolved
datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java
Outdated
Show resolved
Hide resolved
datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java
Show resolved
Hide resolved
...hub-graphql-core/src/main/java/com/linkedin/datahub/graphql/types/dataflow/DataFlowType.java
Show resolved
Hide resolved
...l-core/src/main/java/com/linkedin/datahub/graphql/types/dataflow/mappers/DataFlowMapper.java
Outdated
Show resolved
Hide resolved
...l-core/src/main/java/com/linkedin/datahub/graphql/types/dataflow/mappers/DataFlowMapper.java
Outdated
Show resolved
Hide resolved
...hql-core/src/main/java/com/linkedin/datahub/graphql/types/datajob/mappers/DataJobMapper.java
Outdated
Show resolved
Hide resolved
...hql-core/src/main/java/com/linkedin/datahub/graphql/types/datajob/mappers/DataJobMapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks really solid.
Few top-level things:
- Can you run GQL queries against DataFlow / DataJob and paste them in the PR description under a "Validation" header? (I use POST man) -- Let me know if you need assistance with this.
- Should our modeling be so tightly coupled with Azkaban concepts? AFAIK Azkaban is not the only orchestrator / scheduler in wide use... Airflow among others exist as alternatives
- Will DataFlows / DataJobs become Browsable? If so, when?
...hql-core/src/main/java/com/linkedin/datahub/graphql/types/datajob/mappers/DataJobMapper.java
Outdated
Show resolved
Hide resolved
...hql-core/src/main/java/com/linkedin/datahub/graphql/types/datajob/mappers/DataJobMapper.java
Outdated
Show resolved
Hide resolved
""" | ||
The associated data flow | ||
""" | ||
dataFlow: DataFlow! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modeling question: What if a DataJob is "orphaned". That is, there is no parent DataFlow? (It is run in an ad-hoc manner).
Would someone want to be able to model this? If so, how would we advise they do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we have the DataProcess
which has the the inputs/outputs of the DataJob and the orchestrator of the DataFlow. I think that would be the best option in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with DataProcess - taking a look
""" | ||
Datajob type | ||
""" | ||
type: AzkabanJobType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always an AzkabanJobType?? What if a DataJob is being run on Airflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This modeling seems too azkaban-specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Maybe something for a separate refactor PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would be forced to have backwards compatibility once we accept this one.
So can we think this through before checking in?
outputDatasets: [Dataset!] | ||
} | ||
|
||
enum AzkabanJobType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we just made this "JobType" and tried to generify it further? Alternatively we could have a union of complex objects, one per type.
Thanks, I'll get back to you on this.
I see no reason for it. Would it make sense to refactor the JobType in a separate PR, though?
Preferably yes, but we currently don't have an ETA. |
58984f9
to
c4026b5
Compare
""" | ||
The DATA_FLOW Entity | ||
""" | ||
DATA_FLOW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is looking good to me so long as you've tested using real GQL queries.. You can see some samples here: https://github.com/linkedin/datahub/blob/master/datahub-gms-graphql-service/README.md
""" | ||
The associated data flow | ||
""" | ||
dataFlow: DataFlow! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with DataProcess - taking a look
@frsann Can you confirm you've validated the queries locally? |
Not yet. Focused on the es7 migration last week. Will try tonget this done this week. |
@jjoyce0510 I can now confirm that these queries work, and I've added some sample queries to the README. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you- looks good to me
type | ||
jobId | ||
dataFlow { | ||
urn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully fetching all the fields of dataFlow will work :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I just kept the example short
urn | ||
flowId | ||
} | ||
inputOutput { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this better be named just "lineage"?
if you agree, we can attend to it in a followup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, I was basically trying to map these fields as closely to the aspects names as possible. But I dont see a problem changing this if lineage becomes an established term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
Quick Question - Do we intend to add DataProcess in a future PR?
Great work. Thanks so much Fredrik!
At the latest when we start ingesting some DataProcesses of our own 😉 |
@shirshanka ok to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
We add GraphQL Query functionality for the DataJob and DataFlow entities.
Checklist