Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datajob): Backend implementation #2197

Merged
merged 13 commits into from
Mar 13, 2021

Conversation

frsann
Copy link
Contributor

@frsann frsann commented Mar 9, 2021

This PR adds the backend implementation for DataJob and DataFlow (RFC, model implementation).

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

Comment on lines +7 to +9
"flowId^4",
"orchestrator",
"cluster"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit unsure about this. Improvement suggestions welcome.

Comment on lines +7 to +8
"name^4",
"dataFlow"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit unsure about this. Improvement suggestions welcome.

Copy link
Contributor

@gabe-lyons gabe-lyons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work- this looks good to me, aside from a few comments about browse paths. You also may want to consider using valid urns in your tests- even if they are working now it seems like it might be a time bomb that will blow if someone were to add a validation layer later down the line

@frsann
Copy link
Contributor Author

frsann commented Mar 9, 2021

Thanks @gabe-lyons!

You also may want to consider using valid urns in your tests

Which urns are not valid?

@gabe-lyons
Copy link
Contributor

@frsann I realize now that the urns should be valid- when I first looked i assumed they needed to be urn:li:etc, now I realize the urns are constructed as java classes and urn:li doesn't need to be provided to the constructor

/**
* Urn of the associated DataFlow
*/
flow: optional DataFlowUrn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could a datajob exist without a dataflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in that case it should be a DataProcess 😄 I'll fix this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a curious case. All other entities seem to have all fields other than the urn as optional, and actually when setting this as compulsory I get:

> Task :metadata-models:test FAILED

Gradle suite > Gradle test > com.linkedin.metadata.ModelValidation.validateEntities FAILED
    com.linkedin.metadata.validator.InvalidSchemaException: Entity 'com.linkedin.metadata.entity.DataJobEntity' must contain an optional 'flow' field
        at com.linkedin.metadata.validator.ValidationUtils.invalidSchema(ValidationUtils.java:44)
        at com.linkedin.metadata.validator.EntityValidator.lambda$validateEntitySchema$1(EntityValidator.java:56)
        at java.util.ArrayList.forEach(ArrayList.java:1257)
        at com.linkedin.metadata.validator.EntityValidator.validateEntitySchema(EntityValidator.java:55)
        at com.linkedin.metadata.validator.EntityValidator.validateEntitySchema(EntityValidator.java:68)
        at java.util.ArrayList.forEach(ArrayList.java:1257)
        at com.linkedin.metadata.ModelValidation.validateEntities(ModelValidation.java:32)

Don't know the rational about that validation requirement, though 🤷‍♂️

.setName(true)
.setDescription("My pipeline!")
.setOrchestrator(URN.getOrchestratorEntity())
.setOwners(new StringArray("fbaggins"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha love fbaggins

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This looks great.

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frsann
Copy link
Contributor Author

frsann commented Mar 12, 2021

Still some minor issues with the new type. Will fix it later today.

@frsann frsann requested a review from shirshanka March 12, 2021 21:01
Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@shirshanka shirshanka merged commit da6b3d1 into datahub-project:master Mar 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants