Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to decode static metadata events #2495

Merged
merged 1 commit into from
Jul 17, 2023
Merged

Conversation

pawel-big-lebowski
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski commented May 30, 2023

Problem

Static metadata events (known also as runless) where presented within in the proposal:
https://github.com/OpenLineage/OpenLineage/blob/main/proposals/1837/static_lineage.md

and is being introduced to Openlineage specification within the PR: OpenLineage/OpenLineage#1880

Of course, we would like to implement it within Marquez. We would like to achieve this ultimate goal gradually and this PR contains the first step.

The changes introduced within the PR:

  • Don't affect the way RunEvent events are being collected. Events are saved into database and http status code 201 (created) is returned.
  • DatasetEvent and JobEvent are distinguished by schemaURL property. If distinction cannot be made, events are treated the default way, as RunEvent.
  • DatasetEvent and JobEventevents are deserialised by standard Jackso library DatasetEvent/JobEvent classes. The response returns in such cases 200 OK http status code but DOES NOT SAVE data to database (this will be introduced later). The great benefit of that is that it is a good integration test and a demo of of how to do events' distinction on the server side.

Relates to: #1868

Solution

  • EventTypeResolver is introduced which decodes incoming /create requests to LineageEvent, DatasetEvent or JobEvent based on the schemaURL field.
  • Nothing changes for RunEvents.
  • In case of DatasetEvent or JobEvent, they get deserialised and distinguished, however no operation on database is performed.

One-line summary:

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg bot added api API layer changes client/java labels May 30, 2023
@codecov
Copy link

codecov bot commented May 30, 2023

Codecov Report

Merging #2495 (24c3240) into main (23de0de) will increase coverage by 0.05%.
The diff coverage is 95.00%.

@@             Coverage Diff              @@
##               main    #2495      +/-   ##
============================================
+ Coverage     83.80%   83.86%   +0.05%     
- Complexity     1233     1245      +12     
============================================
  Files           235      238       +3     
  Lines          5625     5657      +32     
  Branches        270      271       +1     
============================================
+ Hits           4714     4744      +30     
- Misses          767      769       +2     
  Partials        144      144              
Impacted Files Coverage Δ
...main/java/marquez/service/models/LineageEvent.java 95.23% <ø> (ø)
...java/marquez/service/models/EventTypeResolver.java 92.00% <92.00%> (ø)
...src/main/java/marquez/api/OpenLineageResource.java 90.00% <100.00%> (+1.11%) ⬆️
...rc/main/java/marquez/service/models/BaseEvent.java 100.00% <100.00%> (ø)
.../main/java/marquez/client/models/LineageEvent.java 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pawel-big-lebowski pawel-big-lebowski changed the title decode static metadata events Ability to decode static metadata events May 30, 2023
@boring-cyborg boring-cyborg bot added the docs label May 30, 2023
@pawel-big-lebowski pawel-big-lebowski marked this pull request as ready for review May 30, 2023 11:38
@harels harels requested a review from julienledem June 1, 2023 22:15
log.warn("Unsupported event type {}. Skipping without error", event.getClass().getName());

// return serialized event
asyncResponse.resume(Response.status(200).entity(event).build());
Copy link
Member

@wslulciuc wslulciuc Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

200 status code is correct for the new OL events, and feel we should also return a 200 when accepting OL run events (as outlined by the OL spec). The semantics should be: "Return 200 OK to signify the OL event has been collected, and eventually will be processed." The 201 status code was never changed during the initial PoC phase of OL. More of a thought, and we'll want to have a follow up PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here was to distinguish between RunEvent that get saved into database (201 created) and other event types that do not affect application state. At the end, once Marquez will be capable of storing dataset and job events, it should return 201 for all the cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we save them in the lineage_events table to start with?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julienledem we'll want to write a proposal on how to handle DatasetEvents and JobEvents (see #2544). For now, let's ensure the event can be accepted (but not stored).

Copy link
Member

@wslulciuc wslulciuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple questions, otherwise 👍

Copy link
Member

@julienledem julienledem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me. I made a few comments

log.warn("Unsupported event type {}. Skipping without error", event.getClass().getName());

// return serialized event
asyncResponse.resume(Response.status(200).entity(event).build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we save them in the lineage_events table to start with?


@AllArgsConstructor
public enum EventSchemaURL {
LINEAGE_EVENT(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be called RUN_EVENT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's use RUN_EVENT.

Comment on lines +63 to +71
if (id == null) {
return context.constructSpecializedType(superType, LINEAGE_EVENT.subType);
}

int lastSlash = id.lastIndexOf('/');

if (lastSlash < 0) {
return context.constructSpecializedType(superType, LINEAGE_EVENT.subType);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add comment that we default to run event for backwards compatibility

Signed-off-by: Pawel Leszczynski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API layer changes client/java docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants