Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPDATE operation is missing a useful AdapterResponse #68

Closed
rjh336 opened this issue Nov 15, 2021 · 1 comment · Fixed by #79
Closed

UPDATE operation is missing a useful AdapterResponse #68

rjh336 opened this issue Nov 15, 2021 · 1 comment · Fixed by #79
Labels
good_first_issue Good for newcomers type:bug Something isn't working

Comments

@rjh336
Copy link
Contributor

rjh336 commented Nov 15, 2021

Describe the bug

When calling load_result() from a statement block connecting to a bigquery db, if the query is an UPDATE operation, we do not get an AdapterResponse on par with other DML operations.

Seems like it should be included here:

elif query_job.statement_type in ['INSERT', 'DELETE', 'MERGE']:

Moreover, what is the reason that SELECT statements do not also get a filled out AdapterResponse?

Steps To Reproduce

{% macro test_response_object() %}
{% call statement('test') %}
... UPDATE STATEMENT ...
{% endcall %}
{% set result = load_result('test') %}
{{ dbt_utils.log_info(result) }}
{% endmacro %}

From the command line:
dbt run-operation test_response_object

Result:

{'response': BigQueryAdapterResponse(_message='OK', code=None, rows_affected=None, bytes_processed=None), 'data': [], 'table': <agate.table.Table object at 0x114f1cd90>}

Expected behavior

BigQueryAdapterResponse field should be populated with values reported by the BigQuery API. Here is an example of what you get when you run a DELETE:

{'response': BigQueryAdapterResponse(_message='DELETE (1.0 rows, 352.5 KB processed)', code='DELETE', rows_affected=1, bytes_processed=360929), 'data': [], 'table': <agate.table.Table object at 0x1065e82b0>}

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:

installed version: 0.21.0
   latest version: 0.21.0

Up to date!

Plugins:
  - bigquery: 0.21.0
  - snowflake: 0.21.0
  - redshift: 0.21.0
  - postgres: 0.21.0

The operating system you're using:
macOS v 11.6 (20G165)

The output of python --version:
Python 3.8.11

@rjh336 rjh336 added type:bug Something isn't working triage:product labels Nov 15, 2021
@jtcohen6
Copy link
Contributor

@rjh336 Thanks for opening the issue!

The reason these values are missing for SELECT + UPDATE statements is because we don't have logic to handle those statement types within execute:

message = 'OK'
code = None
num_rows = None
bytes_processed = None
if query_job.statement_type == 'CREATE_VIEW':
code = 'CREATE VIEW'
elif query_job.statement_type == 'CREATE_TABLE_AS_SELECT':
conn = self.get_thread_connection()
client = conn.handle
query_table = client.get_table(query_job.destination)
code = 'CREATE TABLE'
num_rows = query_table.num_rows
num_rows_formated = self.format_rows_number(num_rows)
bytes_processed = query_job.total_bytes_processed
processed_bytes = self.format_bytes(bytes_processed)
message = f'{code} ({num_rows_formated} rows, {processed_bytes} processed)'
elif query_job.statement_type == 'SCRIPT':
code = 'SCRIPT'
bytes_processed = query_job.total_bytes_processed
message = f'{code} ({self.format_bytes(bytes_processed)} processed)'
elif query_job.statement_type in ['INSERT', 'DELETE', 'MERGE']:
code = query_job.statement_type
num_rows = query_job.num_dml_affected_rows
num_rows_formated = self.format_rows_number(num_rows)
bytes_processed = query_job.total_bytes_processed
processed_bytes = self.format_bytes(bytes_processed)
message = f'{code} ({num_rows_formated} rows, {processed_bytes} processed)'
response = BigQueryAdapterResponse(
_message=message,
rows_affected=num_rows,
code=code,
bytes_processed=bytes_processed
)
return response, table

We totally could! From a little poking around, I think:

  • UPDATE should be a simple addition to the conditional branch that includes ['INSERT', 'DELETE', 'MERGE']
  • SELECT needs its own new conditional branch, which might look something like:
        elif query_job.statement_type == 'SELECT':
            code = 'SELECT'
            bytes_processed = query_job.total_bytes_processed
            
            # these are internal properties, so we should be cautious
            # maybe more trouble than it's worth?
            if (
                hasattr(query_job, '_query_results') 
                and hasattr(query_job._query_results, '_properties')
                and hasattr(query_job._query_results._properties, 'get')
            ):
                num_rows = query_job._query_results._properties.get('totalRows')

I'm going to mark this a good first issue, and I'd welcome a PR to add those in :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good_first_issue Good for newcomers type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants