-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to decode static metadata events #2495
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2495 +/- ##
============================================
+ Coverage 83.80% 83.86% +0.05%
- Complexity 1233 1245 +12
============================================
Files 235 238 +3
Lines 5625 5657 +32
Branches 270 271 +1
============================================
+ Hits 4714 4744 +30
- Misses 767 769 +2
Partials 144 144
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
4a17af8
to
836cb03
Compare
836cb03
to
dd4aadf
Compare
log.warn("Unsupported event type {}. Skipping without error", event.getClass().getName()); | ||
|
||
// return serialized event | ||
asyncResponse.resume(Response.status(200).entity(event).build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
200
status code is correct for the new OL events, and feel we should also return a 200
when accepting OL run events (as outlined by the OL spec). The semantics should be: "Return 200 OK
to signify the OL event has been collected, and eventually will be processed." The 201
status code was never changed during the initial PoC phase of OL. More of a thought, and we'll want to have a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea here was to distinguish between RunEvent
that get saved into database (201 created) and other event types that do not affect application state. At the end, once Marquez will be capable of storing dataset and job events, it should return 201 for all the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we save them in the lineage_events table to start with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julienledem we'll want to write a proposal on how to handle DatasetEvent
s and JobEvent
s (see #2544). For now, let's ensure the event can be accepted (but not stored).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple questions, otherwise 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good to me. I made a few comments
log.warn("Unsupported event type {}. Skipping without error", event.getClass().getName()); | ||
|
||
// return serialized event | ||
asyncResponse.resume(Response.status(200).entity(event).build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we save them in the lineage_events table to start with?
|
||
@AllArgsConstructor | ||
public enum EventSchemaURL { | ||
LINEAGE_EVENT( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be called RUN_EVENT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, let's use RUN_EVENT
.
if (id == null) { | ||
return context.constructSpecializedType(superType, LINEAGE_EVENT.subType); | ||
} | ||
|
||
int lastSlash = id.lastIndexOf('/'); | ||
|
||
if (lastSlash < 0) { | ||
return context.constructSpecializedType(superType, LINEAGE_EVENT.subType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add comment that we default to run event for backwards compatibility
Signed-off-by: Pawel Leszczynski <[email protected]>
dd4aadf
to
24c3240
Compare
Problem
Static metadata events (known also as runless) where presented within in the proposal:
https://github.com/OpenLineage/OpenLineage/blob/main/proposals/1837/static_lineage.md
and is being introduced to Openlineage specification within the PR: OpenLineage/OpenLineage#1880
Of course, we would like to implement it within Marquez. We would like to achieve this ultimate goal gradually and this PR contains the first step.
The changes introduced within the PR:
RunEvent
events are being collected. Events are saved into database and http status code201 (created)
is returned.DatasetEvent
andJobEvent
are distinguished byschemaURL
property. If distinction cannot be made, events are treated the default way, asRunEvent
.DatasetEvent
andJobEvent
events are deserialised by standard Jackso libraryDatasetEvent
/JobEvent
classes. The response returns in such cases200 OK
http status code but DOES NOT SAVE data to database (this will be introduced later). The great benefit of that is that it is a good integration test and a demo of of how to do events' distinction on the server side.Relates to: #1868
Solution
/create
requests toLineageEvent
,DatasetEvent
orJobEvent
based on theschemaURL
field.RunEvents
.DatasetEvent
orJobEvent
, they get deserialised and distinguished, however no operation on database is performed.One-line summary:
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)