-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runless events - consume job event #2661
Conversation
5158fc2
to
bdcfec7
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2661 +/- ##
============================================
+ Coverage 83.76% 84.05% +0.28%
- Complexity 1338 1379 +41
============================================
Files 247 248 +1
Lines 6112 6297 +185
Branches 281 286 +5
============================================
+ Hits 5120 5293 +173
- Misses 843 851 +8
- Partials 149 153 +4 ☔ View full report in Codecov by Sentry. |
32deb1a
to
151ed6b
Compare
) e | ||
GROUP BY e.run_uuid | ||
) f ON f.run_uuid=jv.latest_run_uuid | ||
LEFT OUTER JOIN job_versions_facets f ON j.current_version_uuid = f.job_version_uuid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
* @param jobRow The job. | ||
* @return A {@link BagOfJobVersionInfo} object. | ||
*/ | ||
default BagOfJobVersionInfo upsertRunlessJobVersion( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: We can consider factoring out common code in JobVersionDao.upsertJobVersionOnRunTransition() and upsertRunlessJobVersion()
to avoid duplication. But, not a major concern as we are aware we'll need to revisit some of this code later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can do later further refactor with this section.
api/src/main/resources/marquez/db/migration/V65__dataset_facets_lineage_event_type_nullable.sql
Outdated
Show resolved
Hide resolved
api/src/main/resources/marquez/db/migration/V66.1__job_facets_changes.sql
Outdated
Show resolved
Hide resolved
api/src/main/resources/marquez/db/migration/V66.1__job_facets_changes.sql
Outdated
Show resolved
Hide resolved
CREATE INDEX job_facets_job_version_uuid ON job_facets (job_version_uuid); | ||
|
||
ALTER TABLE lineage_events ADD COLUMN spec_event_type VARCHAR(64); | ||
UPDATE lineage_events SET spec_event_type = 'RunEvent'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to set all events to RunEvent
? Also, I'd define the _event_type
enum as RUN_EVENT
, DATASET_EVENT
, or JOB_EVENT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to enum and set a default value instead of running update table.
84f4670
to
de98803
Compare
✅ Deploy Preview for peppy-sprite-186812 canceled.
|
@@ -0,0 +1,4 @@ | |||
CREATE TYPE EVENT_TYPE AS ENUM ('RUN_EVENT', 'DATASET_EVENT', 'JOB_EVENT'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: We standardized on not defining enums types in the DB layer but rather enforce them in the application layer. This way, it avoids a DB migration every time we (might) add a new event type.
tl;dr, I'd just define _event_type
as a string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added extra commit for that: 0a3f98a
DATASET_EVENT, | ||
JOB_EVENT; | ||
} | ||
|
||
@SqlUpdate( | ||
"INSERT INTO lineage_events (" | ||
+ "event_type, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: We can consider defining a _run_state
column and eventually dropping the event_type
. That is, we can consider columns prefixed with _
to be "remappings" of OL properties to Marquez.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments / suggests, otherwise 💯 💯 🥇
Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Pawel Leszczynski <[email protected]>
b7e40f8
to
0a3f98a
Compare
Problem
As a followup of #2641, this PR introduces support for
JobEvent
. PR shall be merged after #2641.Solution
job_version_uuid
column is added tojob_facets
table. This requires db migration to backfill existingjob_facets
entries.JobDao
asfindJobByName
method should work without join toruns
table.spec_event_type
tolineage_events
table to indicate which type of event is stored.Limitations:
listLineage
endpoints filtersRunEvent
only and does not support introduced event types. This can be implemented within other PR and issue.One-line summary:
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)