Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-671] Statements executed one by one instead of as a whole block #646

Open
2 tasks done
naumovdalex opened this issue Jul 6, 2023 · 6 comments
Open
2 tasks done
Labels
feature:transactions Issues related to managing database transactions pkg:dbt-redshift Issue affects dbt-redshift type:bug Something isn't working as documented type:regression Something used to work and is no longer working

Comments

@naumovdalex
Copy link

naumovdalex commented Jul 6, 2023

Is this a regression in a recent version of dbt-redshift?

  • I believe this is a regression in dbt-redshift functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

We have a macro in dbt that runs a set of DDL statements on Redshift. We concatenate those statements in a string (cmd) and run with "{% do run_query(cmd) %}". That string looks like below;

begin;
  ddl_statement_1;
  ddl_statement_2;
  ddl_statement_3;
  ...
  ...
  ddl_statement_x;
end;

After moving to dbt 1.5, we see that those statements are executed one by one, instead of sending the statement as a whole to database. We can see it both in database logs and dbt logs.

Impact

In 1.5 and onwards, these statements take more than 5 minutes to execute. Previously in versions 1.4 and below, they would execute in only a few seconds.

Expected/Previous Behavior

The below statements are executed at once in a single transaction.

begin;
  ddl_statement_1;
  ddl_statement_2;
  ddl_statement_3;
  ...
  ...
  ddl_statement_x;
end;

Steps To Reproduce

  1. Install dbt-core 1.5
  2. Install dot-redshift 1.5.6
  3. Run a set of ddl statements between begin and end in a macro with "{% do run_query(cmd) %}"

Relevant log output

[0m14:34:35.967673 [debug] [MainThread]: On macro_xyz: drop view if exists db_name.schema_name.object_1;
[0m14:34:36.158399 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.158975 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.159297 [debug] [MainThread]: On macro_xyz: create view db_name.schema_name.object_1 as select * from db_name.another_schema_name.object_1 with no schema binding;
[0m14:34:36.528864 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.531540 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.532585 [debug] [MainThread]: On macro_xyz: grant select on table db_name.schema_name.object_1 to group db_group;
[0m14:34:36.729741 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.730303 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.730638 [debug] [MainThread]: On macro_xyz: drop view if exists db_name.schema_name.object_2;
[0m14:34:36.923462 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.926900 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.927759 [debug] [MainThread]: On macro_xyz: create view db_name.schema_name.object_2 as select * from db_name.another_schema_name.object_2 with no schema binding;
[0m14:34:37.306524 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:37.307047 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:37.307360 [debug] [MainThread]: On macro_xyz: grant select on table db_name.schema_name.object_2 to group db_group;
[0m14:34:37.514141 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds

Environment

- OS: MacOS 13.4 and Ubuntu
- Python: 3.9.16
- dbt-core (working version): 1.4.6
- dbt-redshift (working version): 1.4.6
- dbt-core (regression version): 1.5.0, 1.5.1, 1.5.2
- dbt-redshift (regression version): 1.5.1, 1.5.2, 1.5.6

Additional Context

No response

@naumovdalex naumovdalex added type:bug Something isn't working as documented type:regression Something used to work and is no longer working triage:product In Product's queue labels Jul 6, 2023
@github-actions github-actions bot changed the title Statements executed one by one instead of as a whole block [ADAP-671] Statements executed one by one instead of as a whole block Jul 6, 2023
@jan-benisek
Copy link

We noticed the same with our post-hook, what helped us to set autocommit: False in profiles.yml (more details here)

@dataders
Copy link
Contributor

dataders commented Jul 7, 2023

@naumovdalex sorry to hear your existing workflows is failing. As part of the migration to using the redshift_connector connector library in 1.5 instead of psycopg2, this adapter's approach to transactions have changed such that every query is executed at once.

Check out dbt-labs/dbt-redshift#463 for more context, and I'd love to see if you get the the intended behavior by doing as @jan-benisek says and adding autocommit: False to your profiles.yml.

I suspect that you may still encounter issues due to design decisions of the underlying connector library, for example see aws/amazon-redshift-python-driver#162.

@dataders dataders removed type:bug Something isn't working as documented triage:product In Product's queue type:regression Something used to work and is no longer working labels Jul 13, 2023
@dataders
Copy link
Contributor

@naumovdalex I'm going to close this issue for now. If possible, I think this would make a great reproducible use case to rationalize aws/amazon-redshift-python-driver#162. Perhaps open an issue there detailing the limitation?

@naumovdalex
Copy link
Author

We noticed the same with our post-hook, what helped us to set autocommit: False in profiles.yml (more details here)

I tried this, unfortunately that didn't help as well. Also, as part of our team uses dbt Cloud IDE as well, this would be a bit more complicated for us

@naumovdalex
Copy link
Author

@naumovdalex sorry to hear your existing workflows is failing. As part of the migration to using the redshift_connector connector library in 1.5 instead of psycopg2, this adapter's approach to transactions have changed such that every query is executed at once.

Check out dbt-labs/dbt-redshift#463 for more context, and I'd love to see if you get the the intended behavior by doing as @jan-benisek says and adding autocommit: False to your profiles.yml.

I suspect that you may still encounter issues due to design decisions of the underlying connector library, for example see aws/amazon-redshift-python-driver#162.

@dataders it's not that the workflows fail, but some macros are taking more than 5 minutes to execute, instead of just a few seconds

@dataders
Copy link
Contributor

I understand now -- that not an ideal experience. @naumovdalex can you please provide a reproducible example? This would help the engineer at Redshift work on the above linked issue on their connector library

@dataders dataders reopened this Apr 18, 2024
@dataders dataders added feature:transactions Issues related to managing database transactions type:bug Something isn't working as documented type:regression Something used to work and is no longer working labels Apr 18, 2024
@mikealfare mikealfare added the pkg:dbt-redshift Issue affects dbt-redshift label Jan 15, 2025
@mikealfare mikealfare transferred this issue from dbt-labs/dbt-redshift Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:transactions Issues related to managing database transactions pkg:dbt-redshift Issue affects dbt-redshift type:bug Something isn't working as documented type:regression Something used to work and is no longer working
Projects
None yet
Development

No branches or pull requests

4 participants