-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-1077] [Drops Table In Separate Transaction] - Tables are dropped then the _tmp table renamed, causing other queries referencing the table to fail #693
Comments
This is a regression that requires some design, hence the |
@VersusFacit i added a backport label! |
@gtmehrer Hey there! Thanks for the submission. Just an upfront disclaimer that autocommit is I recreated your logs essentially as you described. My model:
The LogsYou gave some log output, but I'm curious whether that's a red herring of sorts. My thinking: If you dig into core, you'll notice that a new statement for Maybe Mike is right that some special interaction of the new replace macros (added in this PR by the way) could be responsible, but I actually think it's a higher level configuration problem. In dbt land, autocommit is on by default due to the needs of analytics engineers paired with the inertia of the semantics of A possible builtin solution to your issue
As in you are seeing the model get dropped while other threads attempt to pull from it, whereas you want the rename and drop to happen in the same transaction? Could you try with Request for more infoAlso, if you continue to have troubles, could you say more on your use case? It's not immediately clear what your error is from this description or how it's being triggered, but I'd love to know more. I want to be able to reproduce this. Without your use case in front of me, my best attempts to replicate failed regardless of whether I left |
There is an alternate possibility the team is exploring: It could be we need to remove some BEGIN/COMMIT barriers in special edge cases like this one you may have found. I have a PR up for that just in case. We'd still very much appreciate you trying the autocommit adjustment as a sanity check and providing more insight into your use case 🖖 |
Mila and I looked at the BEGIN/COMMIT barriers in 1.6 vs. 1.7.
|
@VersusFacit and I talked through this quite a bit and we're pretty sure we found the issue. The root cause is that we are setting a class variable on DiagnosisWe set and attempt to override it with the correct configuration for
Unfortunately, because to include this conditional: dbt-redshift/dbt/include/redshift/macros/materializations/table.sql Lines 40 to 44 in d77f5ee
The issue is that Demonstrating TestThe following test in from dbt.adapters.redshift.relation import RedshiftRelation
from dbt.contracts.relation import RelationType
def test_renameable_relation():
relation = RedshiftRelation.create(
database="my_db",
schema="my_schema",
identifier="my_table",
type=RelationType.Table,
)
assert relation.renameable_relations == frozenset({
RelationType.View,
RelationType.Table
}) This test currently fails because FixWe need to update this line: renameable_relations: SerializableIterable = field(default_factory=tuple) While we're here, the same mistake was made in a related area on the next line of code: The fix needs to be implemented in RisksWe've been operating under the assumption that we've been using the |
That is some great info @mikealfare and @VersusFacit 🧠 While we're in this area of code, here's one other thing that @VersusFacit and I noticed in #693 (comment):
It doesn't appear to ever get created, so it's a no-op. But it looks like it may be an unnecessary statement. Maybe it will be naturally handled along with the other fixes? |
I dont think it will be since it's an always drop. I think it's just a leftover artifact. I can make a trick to track dealing with this at some point :) |
Thanks so much for digging into this! @VersusFacit I did try with autocommit set to both True and False with the same results either way. Our use case is that we refresh many models frequently throughout the day, around every 30 mins. Outside processes querying dbt managed tables will fail when they attempt to reference a table that has been dropped (when the transaction is immediately commited by dbt). The table is generally only gone for a few seconds before it's recreated by another transaction so it's a little difficult to replicate without high query volume. Also, it just occurred to me that we are using Redshift's snapshot isolation, which may result in a different error than serializable isolation would. https://docs.aws.amazon.com/redshift/latest/dg/r_STV_DB_ISOLATION_LEVEL.html Let me know if I can provide more info or help with testing! |
We merged the fix today. We'll need to do a patch release before it's available on PyPI, which should happen in the next week. Feel free to install from branch Closing as resolved. |
@mikealfare has the fix been included in |
It is. We published a few hours ago. |
Thank you @mikealfare 🙌 |
Is this a new bug in dbt-redshift?
Current Behavior
Models with table materialization are dropped then the transaction is committed. This causes queries referencing the refreshing model to fail if they run at the same time since the table doesn't exist.
I've tried setting
autocommit: true
in profiles.yml to no effect.Expected Behavior
The old behavior was to drop/rename in one transaction.
Steps To Reproduce
Run
dbt run
command in dbt-redshift >=1.7.1 (core >=1.7.3) against a table model.dbt-redshift 1.6.5 (core 1.6.9) runs in a single transaction
Relevant log output
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: