-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Java client method for dataset/job lineage #2623
Add Java client method for dataset/job lineage #2623
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2623 +/- ##
============================================
+ Coverage 83.33% 83.35% +0.02%
- Complexity 1291 1295 +4
============================================
Files 244 244
Lines 5940 5948 +8
Branches 279 279
============================================
+ Hits 4950 4958 +8
Misses 844 844
Partials 146 146
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
51a4055
to
e3b0e9c
Compare
Signed-off-by: David Goss <[email protected]>
e3b0e9c
to
630f73b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been wanting to add lineage calls to the java
client for a while now! Amazing to see the functionality finally added. Thanks @davidjgoss! (And for fixing the extra type
issue in the payload 💯)
Fixes #1527 |
Thanks @wslulciuc It might be worth including as a release note, that for users currently using the existing Also, I realised this method is missing from the Python client as well, so raised #2625 |
Problem
The Java SDK didn't have a method for the dataset/job-level lineage endpoint (
GET /lineage
). See https://marquezproject.slack.com/archives/C01E8MQGJP7/p1692774856981109Closes: #1527
Solution
Adds a new method to
MarquezClient
for the endpoint, along with tests, and the necessary new subclasses ofNodeData
for datasets and jobs.Also, reworks how the polymorphic deserialization is done to get away from the problem described in #1527 which I ran into when working on the new method. This was happening due to the way we were using
@JsonTypeInfo
. Specifically, we had theEXTERNAL_PROPERTY
inclusion strategy on theNodeData
interface class, however (per Jackson docs):This accounted for the extra
type
attribute being added on serialization - the intended behaviour of using the property on the parentNode
was never happening. Unfortunately even moving the relevant annotations to the right places didn't work, I think becausetype
is an existing property onNode
. We'd kind of want a combination of Jackson'sEXISTING_PROPERTY
andEXTERNAL_PROPERTY
but it doesn't exist.Happily, using the
DEDUCTION
resolution strategy (TIL!) works nicely with no extra properties, because each of the subclasses has fields that are both unique and non-nullable, so Jackson can work it out via reflection. It does mean you can construct aNode
with a type that contradicts theNodeData
- but that was kind of the case anyway.For backwards compatibility, the
defaultImpl
forNodeData
in the client is set to the column lineage one. This is because when encountering a payload from the current Marquez API with the extraneoustype
property, the Jackson deduction will get confused and throw. So if consumers upgrade the client first and then Marquez itself, they should see no issues during the transition.One-line summary:
Add Java client method for dataset/job lineage
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)