Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-1077] [Drops Table In Separate Transaction] - Tables are dropped then the _tmp table renamed, causing other queries referencing the table to fail #693

Closed
2 tasks done
gtmehrer opened this issue Dec 19, 2023 · 13 comments
Assignees
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe regression support

Comments

@gtmehrer
Copy link

gtmehrer commented Dec 19, 2023

Is this a new bug in dbt-redshift?

  • I believe this is a new bug in dbt-redshift
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Models with table materialization are dropped then the transaction is committed. This causes queries referencing the refreshing model to fail if they run at the same time since the table doesn't exist.

I've tried setting autocommit: true in profiles.yml to no effect.

Expected Behavior

The old behavior was to drop/rename in one transaction.

Steps To Reproduce

Run dbt run command in dbt-redshift >=1.7.1 (core >=1.7.3) against a table model.

dbt-redshift 1.6.5 (core 1.6.9) runs in a single transaction

Relevant log output

[0m08:54:14.597104 [debug] [Thread-2  ]: On model.$$$$: BEGIN
�[0m08:54:14.646101 [debug] [Thread-2  ]: SQL status: SUCCESS in 0.0 seconds
�[0m08:54:14.669096 [debug] [Thread-2  ]: Applying DROP to: $$$$
�[0m08:54:14.671270 [debug] [Thread-2  ]: Using redshift connection "model.$$$$"
�[0m08:54:14.672171 [debug] [Thread-2  ]: On model.$$$$: /* {"app": "dbt", "dbt_version": "1.7.3", "profile_name": "redshift", "target_name": "$$$$", "node_id": "model.$$$$"} */
drop table if exists $$$$ cascade
�[0m08:54:14.719695 [debug] [Thread-2  ]: SQL status: SUCCESS in 0.0 seconds
�[0m08:54:14.723391 [debug] [Thread-2  ]: On model.$$$$: COMMIT

Environment

- OS: macOS sonoma
- Python: 3.8.13
- dbt-core: 1.7.4
- dbt-redshift: 1.7.1

Additional Context

No response

@gtmehrer gtmehrer added bug Something isn't working triage labels Dec 19, 2023
@github-actions github-actions bot changed the title [Drops Table In Separate Transaction] - Tables are dropped then the _tmp table renamed, causing other queries referencing the table to fail [ADAP-1077] [Drops Table In Separate Transaction] - Tables are dropped then the _tmp table renamed, causing other queries referencing the table to fail Dec 19, 2023
@mikealfare mikealfare added refinement Product or leadership input needed and removed triage labels Feb 2, 2024
@mikealfare
Copy link
Contributor

This is a regression that requires some design, hence the refinement tag. I believe the issue is that we moved from psycopg2 to redshift-connector. redshift-connector implements DBAPI2.0 correctly, which assumes execution of statements one at a time (you need to build the transaction manually if you want it). Whereas psycopg2 allows you to pass it multiple statements and it will automatically "wrap" it in a transaction. The assumption of the latter scenario exists in many places throughout dbt-redshift. This change was made in 1.5, so I'm a little surprised that this was working as a single transaction in 1.6. Perhaps it's the interaction of this scenario, along with the implementation of the replace set of macros that handles swapping tables with materialized views and vice versa.

@mikealfare mikealfare added High Severity bug with significant impact that should be resolved in a reasonable timeframe and removed refinement Product or leadership input needed labels Feb 7, 2024
@VersusFacit VersusFacit self-assigned this Feb 15, 2024
@martynydbt
Copy link
Contributor

@VersusFacit i added a backport label!

@VersusFacit
Copy link
Contributor

VersusFacit commented Feb 23, 2024

@gtmehrer Hey there! Thanks for the submission. Just an upfront disclaimer that autocommit is True by default.

I recreated your logs essentially as you described.

My model:

{{
config(
    materialized='table'
)
}}

select 2

The Logs

You gave some log output, but I'm curious whether that's a red herring of sorts. My thinking: If you dig into core, you'll notice that a new statement for DROP was added, yes! But that's just a log message. The underlying semantics haven't really changed on the calling of that drop statement. Maybe the order has changed. For verifying that, we need more info 🕵️‍♀️

Maybe Mike is right that some special interaction of the new replace macros (added in this PR by the way) could be responsible, but I actually think it's a higher level configuration problem.

In dbt land, autocommit is on by default due to the needs of analytics engineers paired with the inertia of the semantics of psycopg2. Mike tangentially evokes this above.

A possible builtin solution to your issue

This causes queries referencing the refreshing model to fail if they run at the same time since the table doesn't exist.

As in you are seeing the model get dropped while other threads attempt to pull from it, whereas you want the rename and drop to happen in the same transaction?

Could you try with autocommit: False to see if that addresses your issue? That would prevent the always-commit-after-an-operation semantics you're running up against, I think!

Request for more info

Also, if you continue to have troubles, could you say more on your use case? It's not immediately clear what your error is from this description or how it's being triggered, but I'd love to know more. I want to be able to reproduce this. Without your use case in front of me, my best attempts to replicate failed regardless of whether I left autocommit on.

@VersusFacit
Copy link
Contributor

There is an alternate possibility the team is exploring: It could be we need to remove some BEGIN/COMMIT barriers in special edge cases like this one you may have found. I have a PR up for that just in case. We'd still very much appreciate you trying the autocommit adjustment as a sanity check and providing more insight into your use case 🖖

@dbeatty10
Copy link
Contributor

Mila and I looked at the BEGIN/COMMIT barriers in 1.6 vs. 1.7.

1.6

BEGIN
create  table
    "ci"."dbt_dbeatty"."my_model__dbt_tmp"
    
    
    
  as (
    

select 1 as id
  );
alter table "ci"."dbt_dbeatty"."my_model" rename to "my_model__dbt_backup"
alter table "ci"."dbt_dbeatty"."my_model__dbt_tmp" rename to "my_model"
COMMIT
BEGIN
drop table if exists "ci"."dbt_dbeatty"."my_model__dbt_backup" cascade
COMMIT
BEGIN

1.7

BEGIN
create  table
    "ci"."dbt_dbeatty"."my_model__dbt_tmp"
    
    
    
  as (
    

select 1 as id
  );
COMMIT
BEGIN
drop table if exists "ci"."dbt_dbeatty"."my_model" cascade
COMMIT
BEGIN
alter table "ci"."dbt_dbeatty"."my_model__dbt_tmp" rename to "my_model"
COMMIT
BEGIN
drop table if exists "ci"."dbt_dbeatty"."my_model__dbt_backup" cascade
COMMIT
BEGIN

Here's the diff showing two additional COMMIT/BEGIN pairs in 1.7 vs. 1.6:

12c12,16
< alter table "ci"."dbt_dbeatty"."my_model" rename to "my_model__dbt_backup"
---
> COMMIT
> BEGIN
> drop table if exists "ci"."dbt_dbeatty"."my_model" cascade
> COMMIT
> BEGIN

Here's the nitty-gritty details of how we got those commands (using jq):

Reprex

models/my_model.sql

{{ config(materialized="table") }}

select 1 as id

Install jq in macOS using Homebrew (or an alternative command for your environment):

brew install jq

Create virtual environments for dbt 1.6 and 1.7:

python3 -m venv dbt_1.6
source dbt_1.6/bin/activate
python -m pip install --upgrade pip
python -m pip install dbt-core~=1.6.0 dbt-postgres~=1.6.0 dbt-redshift~=1.6.0
source dbt_1.6/bin/activate
dbt --version
deactivate
python3 -m venv dbt_1.7
source dbt_1.7/bin/activate
python -m pip install --upgrade pip
python -m pip install dbt-core~=1.7.0 dbt-postgres~=1.7.0 dbt-redshift~=1.7.0
source dbt_1.7/bin/activate
dbt --version
deactivate

Use 1.6 to run the model and create JSON logs. Use jq to isolate the relevant log messages, then write them to their own file.

source dbt_1.6/bin/activate
rm -rf logs
dbt --log-format-file json run -s my_model
jq -r -c '.data | select(.node_info.resource_type == "model" and .node_info.node_name == "my_model") | select(.sql != null) | .sql' logs/dbt.log | while IFS= read -r line; do echo $line; done > dbt.log.my_model.1.6.sql
deactivate

Use 1.7 to run the model and create JSON logs. Use jq to isolate the relevant log messages, then write them to their own file.

source dbt_1.7/bin/activate
rm -rf logs
dbt --log-format-file json run -s my_model
jq -r -c '.data | select(.node_info.resource_type == "model" and .node_info.node_name == "my_model") | select(.sql != null) | .sql' logs/dbt.log | while IFS= read -r line; do echo $line; done > dbt.log.my_model.1.7.sql
deactivate

Look at the diff between the the SQL that was executed in 1.6 vs. 1.7:

diff dbt.log.my_model.1.6.sql dbt.log.my_model.1.7.sql
12c12,16
< alter table "ci"."dbt_dbeatty"."my_model" rename to "my_model__dbt_backup"
---
> COMMIT
> BEGIN
> drop table if exists "ci"."dbt_dbeatty"."my_model" cascade
> COMMIT
> BEGIN

@mikealfare
Copy link
Contributor

@VersusFacit and I talked through this quite a bit and we're pretty sure we found the issue. The root cause is that we are setting a class variable on BaseRelation in such a way that it does not get properly updated at the implementation layer.

Diagnosis

We set renameable_relations in dbt-core here:
https://github.com/dbt-labs/dbt-core/blob/0a6d0c158e5a1956bd48e53793ca3b28faee2b34/core/dbt/adapters/base/relation.py#L44

and attempt to override it with the correct configuration for dbt-redshift here:

renameable_relations = frozenset(

Unfortunately, because BaseRelation is a frozen dataclass, the override does not take affect. We didn't notice this in 1.6 because dbt-redshift uses the default table materialization in 1.6. However, we needed to use the renameable relations concept in 1.7 because of interactions between tables and materialized views. This required us to override the default table materialization to update this logic:
https://github.com/dbt-labs/dbt-core/blob/0a6d0c158e5a1956bd48e53793ca3b28faee2b34/core/dbt/include/global_project/macros/materializations/models/table.sql#L40

to include this conditional:

{% if existing_relation.can_be_renamed %}
{{ adapter.rename_relation(existing_relation, backup_relation) }}
{% else %}
{{ drop_relation_if_exists(existing_relation) }}
{% endif %}

The issue is that existing_relation.renameable_relations is always empty, hence existing_relation.can_be_renamed is always False. The correct logic path passes the conditional, which would result in the original behavior from dbt-core. Instead, we run through the else path and drop the relation.

Demonstrating Test

The following test in dbt-redshift demonstrates the issue:

from dbt.adapters.redshift.relation import RedshiftRelation
from dbt.contracts.relation import RelationType


def test_renameable_relation():
    relation = RedshiftRelation.create(
        database="my_db",
        schema="my_schema",
        identifier="my_table",
        type=RelationType.Table,
    )
    assert relation.renameable_relations == frozenset({
        RelationType.View,
        RelationType.Table
    })

This test currently fails because relation.renameable_relations==().

Fix

We need to update this line:
https://github.com/dbt-labs/dbt-core/blob/0a6d0c158e5a1956bd48e53793ca3b28faee2b34/core/dbt/adapters/base/relation.py#L44
to use the field method like this:
https://github.com/dbt-labs/dbt-core/blob/0a6d0c158e5a1956bd48e53793ca3b28faee2b34/core/dbt/adapters/base/relation.py#L36
Since the default is a built-in (tuple), you should probably be able to avoid the lambda and do:

renameable_relations: SerializableIterable = field(default_factory=tuple)

While we're here, the same mistake was made in a related area on the next line of code:
https://github.com/dbt-labs/dbt-core/blob/0a6d0c158e5a1956bd48e53793ca3b28faee2b34/core/dbt/adapters/base/relation.py#L50
This should also use the field method to set the empty tuple as a default. The test should be updated (or a second one should be created) to check this is configured correctly as well.

The fix needs to be implemented in dbt-core>=1.6 and dbt-adapters>=1.8. The tests need to be implemented in dbt-snowflake>=1.6, dbt-redshift>=1.6, and dbt-bigquery>=1.7.

Risks

We've been operating under the assumption that we've been using the renameable_relations and replaceable_relations as expected. However, since these were both always False, we may be introducing new logic paths by fixing this bug. Put another way, resolving this bug is effectively implementing renameable_relations, replaceable_relations, and the associated replace macro in some places.

@dbeatty10
Copy link
Contributor

That is some great info @mikealfare and @VersusFacit 🧠

While we're in this area of code, here's one other thing that @VersusFacit and I noticed in #693 (comment):

  • in the last statement, my_model__dbt_backup is dropped if it exists.

It doesn't appear to ever get created, so it's a no-op. But it looks like it may be an unnecessary statement.

Maybe it will be naturally handled along with the other fixes?

@VersusFacit
Copy link
Contributor

I dont think it will be since it's an always drop. I think it's just a leftover artifact. I can make a trick to track dealing with this at some point :)

@gtmehrer
Copy link
Author

gtmehrer commented Mar 5, 2024

Thanks so much for digging into this! @VersusFacit I did try with autocommit set to both True and False with the same results either way.

Our use case is that we refresh many models frequently throughout the day, around every 30 mins. Outside processes querying dbt managed tables will fail when they attempt to reference a table that has been dropped (when the transaction is immediately commited by dbt). The table is generally only gone for a few seconds before it's recreated by another transaction so it's a little difficult to replicate without high query volume.

Also, it just occurred to me that we are using Redshift's snapshot isolation, which may result in a different error than serializable isolation would. https://docs.aws.amazon.com/redshift/latest/dg/r_STV_DB_ISOLATION_LEVEL.html

Let me know if I can provide more info or help with testing!

@mikealfare mikealfare self-assigned this Mar 21, 2024
@mikealfare
Copy link
Contributor

We merged the fix today. We'll need to do a patch release before it's available on PyPI, which should happen in the next week. Feel free to install from branch 1.7.latest if you want to test it in the meantime.

Closing as resolved.

@AlessandroLollo
Copy link

@mikealfare has the fix been included in dbt-redshift==1.7.5?
Thanks! 🙌

@mikealfare
Copy link
Contributor

@mikealfare has the fix been included in dbt-redshift==1.7.5? Thanks! 🙌

It is. We published a few hours ago.

@AlessandroLollo
Copy link

Thank you @mikealfare 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe regression support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants