-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid locking common records and remove transaction annotation #2036
Conversation
Signed-off-by: Michael Collado <[email protected]>
Signed-off-by: Michael Collado <[email protected]>
Signed-off-by: Michael Collado <[email protected]>
Signed-off-by: Michael Collado <[email protected]>
Signed-off-by: Michael Collado <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #2036 +/- ##
============================================
+ Coverage 78.89% 78.92% +0.03%
- Complexity 1014 1018 +4
============================================
Files 197 199 +2
Lines 5548 5557 +9
Branches 421 421
============================================
+ Hits 4377 4386 +9
Misses 724 724
Partials 447 447
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
default NamespaceRow upsertNamespaceRow( | ||
UUID uuid, Instant now, String name, String currentOwnerName) { | ||
doUpsertNamespaceRow(uuid, now, name, currentOwnerName); | ||
return findNamespaceByName(name).orElseThrow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: The RETURNING *
was to avoid inserting, then querying for the namespace in two separate queries. But, also understand the cost of always updating updated_at
to trigger the modified row to be returned. At some point, I'd love to flush out the usage of updated_at
for all of our tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, the RETURNING *
only returns if the record was modified - if we want to DO NOTHING
to avoid locking the record, we have to execute a separate query 🤷🏽♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great analysis and write up @collado-mike on the DB performance! Who knew such "minor" changes could improve query performance so much. As for transactions, I agree. Originally, we wanted to keep the write path for Marquez in sync with metadata collected via OL. Now, I do think we'll have to wrap some DB calls in a transaction, but let's add them as we encounter data inconsistencies.
Problem
The OpenLineage API implementation is unnecessarily wrapped in a
@Transaction
annotation, which means that under heavy write load, there's a lot of unnecessary contention. For example, we lock theNamespace
record in theupsertNamespaceRow
method, even though we don't really update the namespace. Since the whole request is wrapped in a transaction, the namespace record is locked for the entire time we write new jobs, runs, and datasets. If many jobs for the same namespace run simultaneously, each OpenLineage request will wait until the namespace record is unlocked, forming a queue of requests, rather than handling each in parallel. This also happens forJobContext
s andRunArg
s, as oftentimes, many jobs will run with no arguments, no source code location, and no SQL facet. Many completely unrelated jobs will block while awaiting a lock on an emptyJobContext
and/or emptyRunArgs
.I updated each of these
upsert
queries to do nothing on conflict (they're really immutable records) to avoid locking the impacted record.Moreover, I don't see a reason for this API to be wrapped in a transaction at all. These changes don't need to be made atomically. OpenLineage requests are mostly idempotent, so if only partial database updates are applied and the same request made twice, there's no duplication in the data.
Job
s,Run
s,Dataset
s,JobContext
s, andRunArg
s created in the first request will be found and used in the second. Foreign key constraints require us to always write a new record before updating an existing one to reference it, so there's no opportunity to leave the database in an inconsistent state.This screenshot shows the database under heavy write load - note that waits are dominated by
transactionid
(Waiting for a transaction to finish) and the primary contender in this case is theNamespaceDao
.Checklist
CHANGELOG.md
with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary).sql
database schema migration according to Flyway's naming convention (if relevant)