-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(datajob): Backend implementation #2197
feat(datajob): Backend implementation #2197
Conversation
"flowId^4", | ||
"orchestrator", | ||
"cluster" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit unsure about this. Improvement suggestions welcome.
"name^4", | ||
"dataFlow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit unsure about this. Improvement suggestions welcome.
metadata-models/src/main/pegasus/com/linkedin/metadata/relationship/Consumes.pdl
Outdated
Show resolved
Hide resolved
metadata-builders/src/main/java/com/linkedin/metadata/builders/search/DataFlowIndexBuilder.java
Outdated
Show resolved
Hide resolved
metadata-builders/src/main/java/com/linkedin/metadata/builders/search/DataJobIndexBuilder.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work- this looks good to me, aside from a few comments about browse paths. You also may want to consider using valid urns in your tests- even if they are working now it seems like it might be a time bomb that will blow if someone were to add a validation layer later down the line
Thanks @gabe-lyons!
Which urns are not valid? |
@frsann I realize now that the urns should be valid- when I first looked i assumed they needed to be |
metadata-models/src/main/pegasus/com/linkedin/metadata/entity/DataFlowEntity.pdl
Show resolved
Hide resolved
/** | ||
* Urn of the associated DataFlow | ||
*/ | ||
flow: optional DataFlowUrn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could a datajob exist without a dataflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in that case it should be a DataProcess 😄 I'll fix this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a curious case. All other entities seem to have all fields other than the urn as optional, and actually when setting this as compulsory I get:
> Task :metadata-models:test FAILED
Gradle suite > Gradle test > com.linkedin.metadata.ModelValidation.validateEntities FAILED
com.linkedin.metadata.validator.InvalidSchemaException: Entity 'com.linkedin.metadata.entity.DataJobEntity' must contain an optional 'flow' field
at com.linkedin.metadata.validator.ValidationUtils.invalidSchema(ValidationUtils.java:44)
at com.linkedin.metadata.validator.EntityValidator.lambda$validateEntitySchema$1(EntityValidator.java:56)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at com.linkedin.metadata.validator.EntityValidator.validateEntitySchema(EntityValidator.java:55)
at com.linkedin.metadata.validator.EntityValidator.validateEntitySchema(EntityValidator.java:68)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at com.linkedin.metadata.ModelValidation.validateEntities(ModelValidation.java:32)
Don't know the rational about that validation requirement, though 🤷♂️
.setName(true) | ||
.setDescription("My pipeline!") | ||
.setOrchestrator(URN.getOrchestratorEntity()) | ||
.setOwners(new StringArray("fbaggins")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha love fbaggins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This looks great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frsann : there does seem to be a build failure, missing file? https://github.com/linkedin/datahub/pull/2197/checks?check_run_id=2070006004#step:6:661
Still some minor issues with the new type. Will fix it later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR adds the backend implementation for DataJob and DataFlow (RFC, model implementation).
Checklist