Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Java client method for dataset/job lineage #2623

Merged
merged 2 commits into from
Sep 28, 2023

Conversation

davidjgoss
Copy link
Contributor

@davidjgoss davidjgoss commented Sep 24, 2023

Problem

The Java SDK didn't have a method for the dataset/job-level lineage endpoint (GET /lineage). See https://marquezproject.slack.com/archives/C01E8MQGJP7/p1692774856981109

Closes: #1527

Solution

Adds a new method to MarquezClient for the endpoint, along with tests, and the necessary new subclasses of NodeData for datasets and jobs.

Also, reworks how the polymorphic deserialization is done to get away from the problem described in #1527 which I ran into when working on the new method. This was happening due to the way we were using @JsonTypeInfo. Specifically, we had the EXTERNAL_PROPERTY inclusion strategy on the NodeData interface class, however (per Jackson docs):

Note that this mechanism can only be used for properties, not for types (classes). Trying to use it for classes will result in inclusion strategy of basic PROPERTY instead.

This accounted for the extra type attribute being added on serialization - the intended behaviour of using the property on the parent Node was never happening. Unfortunately even moving the relevant annotations to the right places didn't work, I think because type is an existing property on Node. We'd kind of want a combination of Jackson's EXISTING_PROPERTY and EXTERNAL_PROPERTY but it doesn't exist.

Happily, using the DEDUCTION resolution strategy (TIL!) works nicely with no extra properties, because each of the subclasses has fields that are both unique and non-nullable, so Jackson can work it out via reflection. It does mean you can construct a Node with a type that contradicts the NodeData - but that was kind of the case anyway.

For backwards compatibility, the defaultImpl for NodeData in the client is set to the column lineage one. This is because when encountering a payload from the current Marquez API with the extraneous type property, the Jackson deduction will get confused and throw. So if consumers upgrade the client first and then Marquez itself, they should see no issues during the transition.

One-line summary:
Add Java client method for dataset/job lineage

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added api API layer changes client/java labels Sep 24, 2023
@codecov
Copy link

codecov bot commented Sep 24, 2023

Codecov Report

Merging #2623 (06eb2ef) into main (301487e) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main    #2623      +/-   ##
============================================
+ Coverage     83.33%   83.35%   +0.02%     
- Complexity     1291     1295       +4     
============================================
  Files           244      244              
  Lines          5940     5948       +8     
  Branches        279      279              
============================================
+ Hits           4950     4958       +8     
  Misses          844      844              
  Partials        146      146              
Files Coverage Δ
...va/src/main/java/marquez/client/MarquezClient.java 74.20% <100.00%> (+0.35%) ⬆️
...va/src/main/java/marquez/client/MarquezPathV1.java 77.77% <100.00%> (+0.35%) ⬆️
.../java/src/main/java/marquez/client/MarquezUrl.java 73.61% <100.00%> (+1.55%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@davidjgoss davidjgoss marked this pull request as ready for review September 24, 2023 16:38
Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been wanting to add lineage calls to the java client for a while now! Amazing to see the functionality finally added. Thanks @davidjgoss! (And for fixing the extra type issue in the payload 💯)

@wslulciuc
Copy link
Member

Fixes #1527

@wslulciuc wslulciuc merged commit 01ed782 into MarquezProject:main Sep 28, 2023
3 checks passed
@davidjgoss
Copy link
Contributor Author

Thanks @wslulciuc

It might be worth including as a release note, that for users currently using the existing getColumnLineage method, they should upgrade and deploy their usage of the client first, and then Marquez itself, for backwards-compatibility.

Also, I realised this method is missing from the Python client as well, so raised #2625

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes client/java
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lineage response includes type twice in DatasetData
2 participants